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Foreword 



ETAPS 2000 was the third instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference that was 
established in 1998 by combining a number of existing and new conferences. 
This year it comprised five conferences (FOSSACS, EASE, ESOP, CC, TAG AS), 
five satellite workshops (CBS, CMCS, CoFI, GRATRA, INT), seven invited 
lectures, a panel discussion, and ten tutorials. 

The events that comprise ETAPS address various aspects of the system deve- 
lopment process, including specification, design, implementation, analysis, and 
improvement. The languages, methodologies, and tools which support these ac- 
tivities are all well within its scope. Different blends of theory and practice are 
represented, with an inclination towards theory with a practical motivation on 
one hand and soundly-based practice on the other. Many of the issues involved 
in software design apply to systems in general, including hardware systems, and 
the emphasis on software is not intended to be exclusive. 

ETAPS is a loose confederation in which each event retains its own identity, 
with a separate program committee and independent proceedings. Its format is 
open-ended, allowing it to grow and evolve as time goes by. Contributed talks 
and system demonstrations are in synchronized parallel sessions, with invited 
lectures in plenary sessions. Two of the invited lectures are reserved for “unify- 
ing” talks on topics of interest to the whole range of ETAPS attendees. The 
aim of cramming all this activity into a single one-week meeting is to create a 
strong magnet for academic and industrial researchers working on topics within 
its scope, giving them the opportunity to learn about research in related areas, 
and thereby to foster new and existing links between work in areas that were 
formerly addressed in separate meetings. The program of ETAPS 2000 included 
a public business meeting where participants had the opportunity to learn ab- 
out the present and future organization of ETAPS and to express their opinions 
about what is bad, what is good, and what might be improved. 

ETAPS 2000 was hosted by the Technical University of Berlin and was effi- 
ciently organized by the following team: 

Bernd Mahr (General Chair) 

Hartmut Ehrig (Program Coordination) 

Peter Pepper (Organization) 

Stefan Jahnichen (Finances) 

Radu Popescu-Zeletin (Industrial Relations) 

with the assistance of BWO Marketing Service GmbH. The publicity was su- 
perbly handled by Doris Fahndrich of the TU Berlin with assistance from the 
ETAPS publicity chair, Andreas Podelski. Overall planning for ETAPS confe- 
rences is the responsibility of the ETAPS steering committee, whose current 
membership is: 
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Foreword 



Egidio Astesiano (Genova), Jan Bergstra (Amsterdam), Pierpaolo Degano 
(Pisa), Hartmut Ehrig (Berlin), Jose Fiadeiro (Lisbon), Marie-Claude 
Gaudel (Paris), Susanne Graf (Grenoble), Furio Honsell (Udine), Heinrich 
HuBmann (Dresden), Stefan Jahnichen (Berlin), Paul Klint (Amsterdam), 
Tom Maibaum (London), Tiziana Margaria (Dortmund), Ugo Montanari 
(Pisa), Hanne Riis Nielson (Aarhus), Fernando Orejas (Barcelona), 
Andreas Podelski (Saarbriicken), David Sands (Goteborg), Don Sannella 
(Edinburgh), Gert Smolka (Saarbriicken), Bernhard Steffen (Dortmund), 
Wolfgang Thomas (Aachen), Jerzy Tiuryn (Warsaw), David Watt (Glas- 
gow), Reinhard Wilhelm (Saarbriicken) 

ETAPS 2000 received generous sponsorship from: 

the Institute for Gommunication and Software Technology of TU Berlin 
the European Association for Programming Languages and Systems 
the European Association for Theoretical Gomputer Science 
the European Association for Software Development Science 
the “High-Level Scientific Gonferences” component of the European 
Gommission’s Fifth Framework Programme 

I would like to express my sincere gratitude to all of these people and organizati- 
ons, the program committee members of the ETAPS conferences, the organizers 
of the satellite events, the speakers themselves, and finally Springer- Verlag for 
agreeing to publish the ETAPS proceedings. 
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Donald Sannella 
ETAPS Steering Gommittee chairman 




Preface 



This volume contains the 27 papers presented at ESOP 2000, the Ninth European 
Symposium on Programming, which took place in Berlin, March 27-31, 2000. 
The ESOP series originated in 1986 and addresses the design, specification, and 
analysis of programming languages and programming systems. Since 1998, ESOP 
has belonged to the ETAPS confederation. 

The call for papers of ESOP 2000 encouraged the following topics: program- 
ming paradigms and their integration, including concurrent, functional, logic, 
and object-oriented; computational calculi and semantics; type systems, pro- 
gram analysis, and concomitant constraint systems; program transformation; 
programming environments and tools. 

The volume starts with a contribution from Martin Odersky, the invited 
speaker of the conference. The remaining 26 papers were selected by the program 
committee from 84 submissions (almost twice as many as for ESOP 99). With 
two exceptions, each submission received at least three reviews, done by the 
program committee members or their subreferees (names appear below). Once 
the initial reviews were available, we had two weeks for conflict resolution and 
paper selection, supported by a database system with Web interfaces. 

I would like to express my sincere gratitude to Christian Schulte who took 
care of the software, handled the submissions, tracked the refereeing process, and 
finally assembled the proceedings. Then, of course, I am grateful to my fellow 
program committee members, the many additional referees, and the authors of 
the submitted papers. Finally, I have to thank Don Sannella, who smoothly or- 
ganized the program at the ETAPS level and relieved me of many organizational 
burdens. 



January 2000 



Gert Smolka 
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Functional Nets 



Martin Odersky 

Ecole Polytechnique Federale de Lausanne 



Abstract. Functional nets combine key ideas of functional program- 
ming and Petri nets to yield a simple and general programming nota- 
tion. They have their theoretical foundation in Join calculus. This paper 
presents functional nets, reviews Join calculus, and shows how the two 
relate. 



1 Introduction 

Functional nets are a way to think about programs and computation which is 
born from a fusion of the essential ideas of functional programming and Petri 
nets. As in functional programming, the basic computation step in a functional 
net rewrites function applications to function bodies. As in Petri-Nets, a rewrite 
step can require the combined presence of several inputs (where in this case 
inputs are function applications). This fusion of ideas from two different areas 
results in a style of programming which is at the same time very simple and very 
expressive. 

Functional nets have a theoretical foundation in join calculus [15,16]. They 
have the same relation to join calculus as classical functional programming has to 
A-calculus. That is, functional nets constitute a programming method which de- 
rives much of its simplicity and elegance from close connections to a fundamental 
underlying calculus. A-calculus [10,5] is ideally suited as a basis for functional 
programs, but it can support mutable state only indirectly, and nondetermi- 
nism and concurrency not at all. The pair of join calculus and functional nets 
has much broader applicability - functional, imperative and concurrent program 
constructions are supported with equal ease. 

The purpose of this paper is two-fold. First, it aims to promote functional 
nets as an interesting programming method of wide applicability. We present 
a sequence of examples which show how functional nets can concisely model 
key constructs of functional, imperative, and concurrent programming, and how 
they often lead to better solutions to programming problems than conventional 
methods. 

Second, the paper develops concepts to link our programming notation of 
functional nets with the underlying calculus. To scale up from a calculus to 
a programming language, it is essential to have a means of aggregating func- 
tions and data. We introduce qualified definitions as a new syntactic construct 
for aggregation. In the context of functional nets, qualified definitions provide 
more flexible control over visibility and initialization than the more conventional 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 1-25, 2000. 
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record- or object-constructors. They are also an excellent fit to the underlying 
join calculus, since they maintain the convention that every value has a name. 
We will present object-based join calculus, an extension of join calculus with 
qualified definitions. This extension comes at surprisingly low cost, in the sense 
that the calculus needs to be changed only minimally and all concepts carry over 
unchanged. By contrast, conventional record constructors would create anony- 
mous values, which would be at odds with the name-passing nature of join. 

The notation for writing examples of functional nets is derived from Silk, 
a small language which maps directly into our object-based extension of join. 
An implementation of Silk is publicly available. There are also other languages 
which are based in some form on join calculus, and which express the constructs 
of functional nets in a different way, e.g. Join[17] or JoCaml[14]. We have chosen 
to develop and present a new notation since we wanted to support both functions 
and objects in a way which was as simple as possible. 

As every program notation should be, functional nets are intended to be 
strongly typed, in the sense that all type errors should be detected rather than 
leading to unspecified behavior. We leave open whether type checking is done 
statically at compile time or dynamically at run time. Our examples do not men- 
tion types, but they are all of a form that would be checkable using a standard 
type system with recursive records, subtyping and polymorphism. 

The rest of this paper is structured as follows. Section 2 introduces functional 
nets and qualified definitions. Sections 3 and 4 show how common functional and 
imperative programming patterns can be modeled as functional nets. Section 5 
discusses concurrency and shows how functional nets model a wide spectrum 
of process synchronization techniques. Section 6 introduces object-based join 
calculus as the formal foundation of functional nets. Section 7 discusses how the 
programming notation used in previous sections can be encoded in this calculus. 
Section 8 discusses related work and concludes. 

2 A First Example 

Consider the task of implementing a one-place buffer, which connects producers 
and consumers of data. Producers call a function put to deposit data into the 
buffer while consumers call a function get to retrieve data from the buffer. There 
can be at most one datum in the buffer at any one time. A put operation on a 
buffer which is already full blocks until the buffer is empty. Likewise, a get on 
an empty buffer blocks until the buffer is full. This specification is realized by 
the following simple functional net: 

def get & full X = x &l empty, 
put X &L empty = {) &l full x 

The net contains two definitions which together define four functions. Two of the 
functions, put and get, are meant to be called from the producer and consumer 
clients of the buffer. The other two, full and empty, reflect the buffer’s internal 
state, and should be called only from within the buffer. 
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Function put takes a single argument, x. We often write a function argument 
without surrounding parentheses, e.g. put x instead of put(x). We also admit 
functions like get that do not take any argument; one can imagine that every 
occurrence of such a function is augmented by an implicit empty tuple as argu- 
ment, e.g. get becomes get(). 

The two equations define rewrite rules. A set of function calls that matches 
the left-hand side of an equation may be rewritten to the equation’s right-hand 
side. The &l symbol denotes parallel composition. We sometimes call &l a fork if 
it appears on an equation’s right-hand side, and a join if it appears on the left. 
Consequently, the left-hand sides of equations are also called join patterns. 

For instance, the equation 

get &L full X — X &L empty 

states that if there are two concurrent calls, one to get and the other to full x 
for some value x, then those calls may be rewritten to the expression x &l empty. 
That expression returns x as get’s result and in parallel calls function empty. 
Unlike get, empty does not return a result; it’s sole purpose is to enable via the 
second rewrite rule calls to put to proceed. We call result-returning functions 
like get synchronous, whereas functions like empty are called asynchronous. 

In general, only the leftmost operand of a fork or a join can return a result. 
All function symbols of a left-hand side but the first one are asynchronous. 
Likewise, all operands of a fork except the first one are asynchronous or their 
result is discarded. 

It’s now easy to interpret the second rewrite rule, 
put X & empty — {) &l full x 

This rule states that two concurrent calls to put x &l empty and may be rewritten 
to 0 &L full X. The result part of that expression is the unit value (); it signals 
termination and otherwise carries no interesting information. 

Clients of the buffer still need to initialize it by calling empty. A simple usage 
of the one-place buffer is illustrated in the following example. 

def get &L full X — x &l empty, 
put X &L empty = () & full x; 

put 1 & 

( val y = get ; val r = y -|- y ; print r ; put r ) &l 
( val z = get ; val r = y * y ; print r; put r ) &l 
empty 

Besides the initializer empty there are three client processes composed in parallel. 
One process puts the number 1 into the buffer. The other two processes both try 
to get the buffer’s contents and put back a modified value. The construct 

val y = get ; ... 
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evaluates the right-hand side expression get and defines y as a name for the 
resulting value. The defined name y remains visible in the expression following 
the semicolon. By contrast, if we had written def y = get; ... we would have 
defined a function y, which each time it was called would call in turn get. The 
definition itself would not evaluate anything. 

As usual, a semicolon between expressions stands for sequencing. The com- 
bined expression print r; put r first prints its argument r and then puts it into the 
buffer. 

The sequence in which the client processes in the above example execute is 
arbitrary, controlled only by the buffer’s rewrite rules. The effect of running the 
example program is hence the output of two numbers, either (2, 4) or (1, 2), 
depending which client process came first. 

Objects The previous example mixed the definition of a one-place buffer and 
the client program using it. A better de-coupling is obtained by defining a con- 
structor function for one-place buffers. The constructor, together with a program 
using it can be written as follows. 

def newBuffer = { 

def get &L full X = x & empty, 
put X &L empty = () & full x; 

(get, put) & empty 

}; 

val (get’, put’) = newBuffer; 

put’ 1 & 

( val y = get’ ; val r = y -f y ; print r ; put’ r ) &l 
( val z = get’ ; val r = y * y ; print r ; put’ r ) 

The defining equations of a one-place buffer are now local to a block, from which 
the pair of locally defined functions get and put is returned. Parallel to returning 
the result the buffer is initialized by calling empty. The initializer empty is now 
part of the constructor function; clients no longer can call it explicitly, since 
empty is defined in a local block and not returned as result of that block. Hence, 
newBuffer defines an object with externally visible methods get and put and 
private methods empty and full. The object is represented by a tuple which 
contains all externally visible methods. 

This representation is feasible as long as objects have only a few externally 
visible methods, but for objects with many methods the resulting long tuples 
quickly become unmanageable. Furthermore, tuples do not support a notion 
of subtyping, where an object can be substituted for another one with fewer 
methods. We therefore introduce records as a more suitable means of aggregation 
where individual methods can be accessed by their names, and subtyping is 
possible. 

The idiom for record access is standard. If r denotes a record, then r.f denotes 
the field of r named /. We also call references of the form r.f qualified names. The 
idiom for record creation is less conventional. In most programming languages. 




Functional Nets 



5 



records are defined by enumerating all field names with their values. This notion 
interacts poorly with the forms of definitions employed in functional nets. In 
a functional net, one often wants to export only some of the functions defined 
in a join pattern whereas other functions should remain hidden. Moreover, it 
is often necessary to call some of the hidden functions as part of the object’s 
initialization. 

To streamline the construction of objects, we introduce qualified names not 
only for record accesses, but also for record definitions. For instance, here is a 
re-formulation of the newBuffer function using qualified definitions. 

def newBuffer = { 

def this. get &l full x = x & empty, 
this. put X &L empty = {) &l full x; 
this &L empty 

}; 

val buf = newBuffer; 

buf.put 1 &L 

( val y = buf. get ; val r = y + y ; print r ; buf.put r ) & 

( val z = buf.get ; val r = y * y ; print r ; buf.put r ) 

Note the occurrence of the qualified names this. get and this. put on the left-hand 
side of the local definitions. These definitions introduce three local names: 

— the local name this, which denotes a record with two fields, get and put, and 

— local names empty and full, which denote functions. 

Note that the naming of this is arbitrary, any other name would work equally 
well. Note also that empty and full are not part of the record returned from 
newRef, so that they can be accessed only internally. 

The identifiers which occur before a period in a join pattern always define 
new record names, which are defined only in the enclosing definition. It is not 
possible to use this form of qualified definition to add new fields to a record 
defined elsewhere. 

Some Notes on Syntax We assume the following order of precedence, from strong 
to weak: 

()and(.) . (&) , (=) , (,) , (;) . 

That is, function application and selection bind strongest, followed by parallel 
composition, followed by the equal sign, followed by comma, and finally followed 
by semicolon. Function application and selection are left associative, & is asso- 
ciative, and ; is right associatve. Other standard operators such as -I-, *, == fall 
between function application and & in their usual order of precedence. When 
precedence risks being unclear, we’ll use parentheses to disambiguate. 

As a syntactic convenience, we allow indentation instead of ;-separators inside 
blocks delimited with braces { and }. Except for the significance of indentation. 
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braces are equivalent to parentheses. The rules are as follows: (1) in a block deli- 
mited with braces, a semicolon is inserted in front of any non-empty line which 
starts at the same indentation level as the first symbol following the opening 
brace, provided the symbol after the insertion point can start an expression or 
definition. The only modification to this rule is: (2) if inserted semicolons would 
separate two def blocks, yielding def Di ; def D 2 say, then the two def blocks 
are instead merged into a single block, i.e. def Di, D 2 . (3) The top level program 
is treated like a block delimited with braces, i.e. indentation is significant. 

With these rules, the newBuffer example can alternatively be written as fol- 
lows. 

def newBuffer = { 

def this. get &l full x = x &l empty 
def this. put x &l empty = {) &l full x 
this &L empty 

} 

val buf = newBuffer 
buf.put 1 &L 

{ val y = buf.get ; val r = y -|- y ; print r ; buf.put r } &l 
{ val z = buf.get ; val r = y * y ; print r ; buf.put r } 

A common special case of a qualified definition is the definition of a record with 
only externally visible methods: 

( def this.f = ... , this.g = ... ; this ) 

This idiom can be abbreviated by omitting the this qualifier and writing only 
the definitions. 

( def f = ... , g = ... ) 

3 Functional Programming 

A functional net that does not contain any occurrences of & is a purely functional 
program. For example, here’s the factorial function written as a functional net. 

def factorial n = if (n == 0) 1 

else n * factorial (n-1) 

Except for minor syntactical details, there’s nothing which distinguishes this 
program from a program written in a functional language like Haskell or ML. 
We assume that evaluation of function arguments is strict: In the call f (g x), g x 
will be evaluated first and its value will be passed to f. 

Functional programs often work with recursive data structures such as trees 
and lists. In Lisp or Scheme such data structures are primitive S-expressions, 
whereas in ML or Haskell they are definable as algebraic data types. Our fun- 
ctional net notation does not have a primitive tree type, nor has it constructs 
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for defining algebraic data types and for pattern matching their values. It does 
not need to, since these constructs can be represented with records, using the 
Visitor pattern[18]. 

The visitor pattern is the object-oriented version of the standard Church 
encoding of algebraic data types. A visitor encodes the branches of a pattern 
matching case expression. It is represented as a record with one method for each 
branch. For instance, a visitor for lists would always have two methods: 

def Nil = ... 

def Cons (x, xs) = ... 

The intention is that our translation of pattern matching would call either the 
Nil method or the Cons method of a given visitor, depending what kind of list 
was encountered. If the encountered list resulted from a Cons we also need to 
pass the arguments of the original Cons to the visitor’s Cons. 

Assume we have already defined a method match for lists that takes a list 
visitor as argument and has the behavior just described. Then one could write 
an isEmpty test function over lists as follows: 

def isEmpty xs = xs. match { 
def Nil = true 
def Cons (x, xsl) = false 

} 

More generally, every function over lists can be defined in terms of match. So, in 
order to define a record which represents a list, all we need to do is to provide a 
match method. How should match be defined? Clearly, its behavior will depend 
on whether it is called on an empty or non-empty list. Therefore, we define two 
list constructors Nil and Cons, with two different different implementations for 
match. The implementations are straightforward: 

val List = { 

def Nil = { def match v = v.Nil } 

def Cons (x, xs) = { def match v = v.Cons (x, xs) } 

} 

In each case, match simply calls the appropriate method of its visitor argument 
V, passing any parameters along. We have chosen to wrap the Nil and Cons 
constructors in another record, named List. List acts as a module, which provides 
the constructors of the list data type. Clients of the List module then construct 
lists using qualified names List. Nil and List. Cons. Example: 

def concat (xs, ys) = xs. match { 
def Nil = ys 

def Cons (x, xs) = List. Cons (x, concat (xsl, ys)) 

} 

Note that the qualification with List lets us distinguish the constructor Cons, 
defined in List, from the visitor method Cons, which is defined locally. 
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4 Imperative Programming 

Imperative programming extends purely functional programming with the addi- 
tion of mutable variables. A mutable variable can be modeled as a reference cell 
object, which can be constructed as follows. 

def newRef initial = { 

def this. value &l state x — x &l state x, 
this. update y &l state x — {) &l state y 

this &L state initial 

} 

The structure of these definitions is similar to the one-place buffer in Section 2. 
The two synchronous functions value and update access and update the variable’s 
current value. The asynchronous function state serves to remember the variable’s 
current value. The reference cell is initialized by calling state with the initial value. 
Here is a simple example of how references are used: 

val count = newRef 0 

def increment = count. update (count. value -|- 1) 
increment 

Building on reference cell objects, we can express the usual variable access not- 
ation of imperative languages by two simple syntactic expansions: 

var X := E expands to val _x = newRef E ; def x = _x. value 

X := E expands to _x. update E 

The count example above could then be written more conventionally as follows. 

var count := 0 

def increment = count := count 4- 1 

In the object-oriented design and programming area, an object is often characte- 
rized as having “state, behavior, and identity” . Our encoding of objects expresses 
state as a collection of applications of private asynchronous functions, and be- 
havior as a collection of externally visible functions. But what about identity? If 
functional net objects had an observable identity it should be possible to define 
a method eq which returns true if and only if its argument is the same object 
as the current object. Here “sameness” has to be interpreted as “created by the 
same operation”, structural equality is not enough. E.g., assuming that the - as 
yet hypothetical - eq method was added to reference objects, it should be pos- 
sible to write val (rl, r2) = (newRef 0, newRef 0) and to have rl.eq(rl) == true 
and rl.eq(r2) == false. 

Functional nets have no predefined operation which tests whether two na- 
mes or references are the same. However, it is still possible to implement an eq 
method. Here’s our first attempt, which still needs to be refined later. 
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Fig. 1. Analogy to Petri nets 



def newObjectWithIdentity = { 

def this.eq other &l flag x = resetFlag (other. testFlag &l flag true) 
this.testFlag &l flag x — x &l flag x 
resetFlag result &l flag x — x &l flag false 
this &L flag false 

} 

This defines a generator function for objects with an eq method that tests for 
identity. The implementation of eq relies on three helper functions, flag, testFlag, 
and resetFlag. Between calls to the eq method, flag false is always asserted. The 
trick is that the eq method asserts flag true and at the same time tests whether 
other. flag is true. If the current object and the other object are the same, that test 
will yield true. On the other hand, if the current object and the other object are 
different, the test will yield false, provided there is not at the same time another 
ongoing eq operation on object other. Hence, we have arrived at a solution of 
our problem, provided we can prevent overlapping eq operations on the same 
objects. In the next section, we will develop techniques to do so. 

5 Concurrency 

The previous sections have shown how functional nets can express sequential 
programs, both in functional and in imperative style. In this section, we will show 
their utility in expressing common patterns of concurrent program execution. 

Functional nets support an resource-based view of concurrency, where calls 
model resources, expresses conjunction of resources, and a definition acts as 
a rewrite rule which maps input sets of resources into output sets of resources. 
This view is very similar to the one of Petri nets [29,32]. In fact, there are 
direct analogies between the elements of Petri nets and functional nets. This is 
illustrated in Figure 1. 

A transition in a Petri net corresponds to a rewrite rule in a functional net. 
A place in a Petri net corresponds to a function symbol applied to some (formal 
or actual) arguments. A token in a Petri net corresponds to some actual call 
during the execution of a functional net (in analogy to Petri nets, we will also 
call applications of asynchronous functions tokens). The basic execution step 
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of a Petri net is the firing of a transition which has as a precondition that all 
in-going places have tokens in them. Quite similarly, the basic execution step of 
a functional net is a rewriting according to some rewrite rule, which has as a 
precondition that all function symbols of the rule’s left-hand side have matching 
calls. 

Functional nets are considerably more powerful than conventional Petri nets, 
however. First, function applications in a functional net can have arguments, 
whereas tokens in a Petri net are unstructured. Second, functions in a functional 
net can be higher-order, in that they can have functions as their arguments. In 
Petri nets, such self-referentiality is not possible. Third, definitions in a functional 
net can be nested inside rewrite rules, such that evolving net topologies are 
possible. A Petri-net, on the other hand, has a fixed connection structure. 

Colored Petri nets [24] let one pass parameters along the arrows connecting 
places with transitions. These nets are equivalent to first-order functional nets 
with only global definitions. They still cannot express the higher-order and evo- 
lution aspects of functional nets. Bussi and Asperti have translated join calculus 
ideas into standard Petri net formalisms. Their mobile Petri nets [4] support 
first-class functions and evolution, and drop at the same time the locality re- 
strictions of join calculus and functional nets. That is, their notation separates 
function name introduction from rewrite rule definition, and allows a function 
to be defined collectively by several unrelated definitions. 

In the following, we will present several well-known schemes for process syn- 
chronization and how they each can be expressed as functional nets. 

Semaphores A common mechanism for process synchronization is a lock (or: 
semaphore). A lock offers two atomic actions: getLock and releaseLock. Here’s 
the implementation of a lock as a functional net: 

def newLock = { 

def this. getLock &l this. releaseLock = () 

this &L this. releaseLock 

} 

A typical usage of a semaphore would be: 

val s = newLock ; ... 

s. getLock ; "< critical region >" ; s. releaseLock 

With semaphores, we can now complete our example to define objects with 
identity: 

val global = newLock 

def newObjectWithIdentity = { 

def this.eq other = global. getLock ; this.testEq other ; global. releaseLock 
this.testEq other &l flag x = resetFlag (other. testFlag &l flag true) 
this.testFlag &l flag x — x &l flag x 

resetFlag result &l flag x = x &l flag false 

this &L flag false 

} 
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This code makes use of a global lock to serialize all calls of eq methods. This 
is admittedly a brute force approach to mutual exclusion, which also serializes 
calls to eq over disjoint pairs of objects. A more refined locking strategy is hard 
to come by, however. Conceptually, a critical region consists of a pair of objects 
which both have to be locked. A naive approach would lock first one object, then 
the other. But this would carry the risk of deadlocks, when two concurrent eq 
operations involve the same objects, but in different order. 

Asynchronous Channels Quite similar to a semaphore is the definition of an 
asynchronous channel with two operations, read and write: 

def newAsyncChannel = { 

def this. read & this. write x = x 
this 

} 

Asynchronous channels are the fundamental communication primitive of asyn- 
chronous 7T calculus [8,23] and languages based on it, e.g. Pict[30] or Piccola[lj. 
A typical usage scenario of an asynchronous channel would be: 

val c = newAsyncChannel 
def producer = { 
var X := 1 

while (true) { val y:=x;x:=x-|-l & c. write y } 

} 

def consumer = { 

while (true) { val y = c.read ; print y } 

} 

producer & consumer 

The producer in the above scenario writes consecutive integers to the channel c 
which are read and printed by the consumer. The writing is done asynchronously, 
in parallel to the rest of the body of the producer’s while loop. Hence, there is 
no guarantee that numbers will be read and printed in the same order as they 
were written. 



Synchronous Channels A potential problem with the previous example is that 
the producer might produce data much more rapidly than the consumer consu- 
mes them. In this case, the number of pending write operations might increase 
indefinitely, or until memory was exhausted. The problem can be avoided by 
connecting producer and consumer with a synchronous channel. 

In a synchronous channel, both reads and writes return and each opera- 
tion blocks until the other operation is called. Synchronous channels are the 
fundamental communication primitive of classical 7r-calculus[27j. They can be 
represented as functional nets as follows. 
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def newSyncChannel = { 
def this. read &l noReads 
this. write x &l noWrites 
readl &l write2 x 
writel &L read2 

this 

} 



readl &l read2, 
writel &L write2 x, 
X &L noWrites, 

0 &L noReads 



This implementation is more involved than the one for asynchronous channels. 
The added complexity stems from the fact that a synchronous channel connects 
two synchronous operations, yet in each join pattern there can be only one 
function that returns. Our solution is similar to a double handshake protocol. It 
splits up read and write into two sub-operations each, readl, read2 and writel, 
write2. The sub-operations are then matched in two join patterns, in opposite 
senses. In one pattern it is the read sub-operation which returns whereas in 
the second one it is the write sub-operation. The noReads and noWrites tokens 
are necessary for serializing reads and writes, so that a second write operation 
can only start after the previous read operation is finished and vice versa. With 
synchronous channels, our producer/consumer example can be written as follows. 



val c = newSyncChannel 
def producer = { 
var X := 1 

while (true) { c. write x ; x := x -|- 1 } 

} 

def consumer = { 

while (true) { val y = c.read ; print y } 

} 

producer &l consumer 



Monitors Another scheme for process communication is to use a common store 
made up of mutable variables, and to use mutual exclusion mechanisms to pre- 
vent multiple processes from updating the same variable at the same time. A 
simple mutual exclusion mechanism is the monitor [20,21] which ensures that 

only one of a set of functions fi can be active at any one time. A monitor 

is easily represented using an additional asynchronous function, turn. The turn 
token acts as a resource which is consumed at the start of each function b and 
which is reproduced at the end: 

def fi &L turn = ... ; turn, 

ffc &L turn = ... ; turn 

For instance, here is an example of a counter which can be incremented and 
decremented: 
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def newBiCounter = { 
var count 0 

def this. increment turn = count := count + 1 ; turn 
def this. decrement &l turn = count := count - 1 ; turn 
this 

} 

Readers and Writers A more complex form of synchronization distinguishes bet- 
ween readers which access a common resource without modifying it and writers 
which can both access and modify it. To synchronize readers and writers we need 
to implement operations startRead, startWrite, endRead, endWrite, such that: 

— there can be multiple concurrent readers, 

— there can only be one writer at one time, 

— pending write requests have priority over pending read requests, but don’t 
preempt ongoing read operations. 

This form of access control is common in databases. It can be implemented using 
traditional synchronization mechanisms such as semaphores, but this is far from 
trivial. We arrive at a functional net solution to the problem in two steps. 

The initial solution is given at the top of Figure 2. We make use of two 
auxiliary tokens. The token readers n keeps track in its argument n of the number 
of ongoing reads, while writers n keeps track in n of the number of pending writes. 
A StartRead operation requires that there are no pending writes to proceed, i.e. 
writers 0 must be asserted. In that case, startRead continues with startRead 1, 
which reasserts writers 0, increments the number of ongoing readers, and returns 
to its caller. By contrast, a startWrite operation immediately increments the 
number of pending writes. It then continues with startWritel, which waits for 
the number of readers to be 0 and then returns. Note the almost-symmetry 
between startRead and startWrite, where the different order of actions reflects 
the different priorities of readers and writers. 

This solution is simple enough to trust its correctness. But the present for- 
mulation is not yet valid Silk because we have made use of numeric arguments in 
join patterns. For instance readers 0 expresses the condition that the number of 
readers is zero. We arrive at an equivalent formulation in Silk through factoriza- 
tion. A function such as readers which represents a condition is split into several 
sub-functions which together partition the condition into its cases of interest. In 
our case we should have a token noReaders which expresses the fact that there 
are no ongoing reads as well as a token readers n, where n is now required to be 
positive. Similarly, writers n is now augmented by a case noWriters. After split- 
ting and introducing the necessary case distinctions, one obtains the functional 
net listed at the bottom of Figure 2. 
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Initial solution: 



} 



def this.startRead 


& writers 


StartReadl 


& readers 


this.startWrite 


& writers 


StartWritel 


readers 


this.endRead 


& readers 


this.endWrite 


& writers 


this & readers 0 & 


writers 0 



After factorization: 

def newReadersWriters = { 

def this.startRead & noWriters 
startReadl & noReaders 
startReadl & readers n 



this.startWrite noWriters 
this.startWrite & writers n 
startWritel & noReaders 

this.endRead & readers n 
this.endWrite & writers n 



this & noReaders & noWriters 

} 



StartReadl, 

0 writers 0 & readers (n+1), 
StartWritel & writers (n+1), 

0 . 

readers (n-1), 

writers (n-1) & readers 0 



StartReadl, 

0 & noWriters & readers 1, 

() & noWriters & readers (n+1), 

StartWritel & writers 1, 
StartWritel & writers (n+1), 

0 . 

if (n == 1) noReaders 
else readers (n-1), 
noReaders & 
if (n == 1) noWriters 
else writers (n-1) 



Fig. 2. Readers/ writers synchronization 



6 Foundations: The Join Calculus 

Functional nets have their formal basis in join calculus [15]. We now present this 
basis, in three stages. In the first stage, we study a subset of join calculcus which 
can be taken as the formal basis of purely functional programs. This calculus is 
equivalent to (call- by- value) A-calculus[31], but takes the opposite position on 
naming functions. Where A-calculus knows only anonymous functions, functional 
join calculus insists that every function have a name. Furthermore, it also insists 
that every intermediate result be named. As such it is quite similar to common 
forms of intermediate code found in compilers for functional languages. 

The second stage adds fork and join operators to the constructs introduced 
in the first stage. The calculus developed at this stage is equivalent in principle 
to the original join calculus, but some syntactical details have changed. 

The third stage adds qualified names in definitions and accesses. The calculus 
developed in this stage can model the object-based functional nets we have used. 
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Syntax: 



Names 


a, b, c. 


x,y,z 


Terms 


M,N 


= def D •, M \ x(y) 


Definitions 


D 


II 

II 


Left-hand sides 


L 


= x{y) 


Reduction contexts 


R 


= [] 1 defD ; R 



Structural Equivalence: a-renaming. 
Reduction: 



def x{y) = M ; R[x{z)\ — >■ def x{y) = M ; R[[z/y\M] 



Fig. 3. Pure functional calculus 



All three stages represent functional nets as a reduction system. There is 
in each case only a single rewrite rule, which is similar to the /3-reduction rule 
of A-calculus, thus closely matching intuitions of functional programming. By 
contrast, the original treatment of join calculus is based on a chemical abstract 
machine [6], a concept well established in concurrency theory. The two versions 
of join calculus complement each other and are (modulo some minor syntactical 
details) equivalent. 



6.1 Pure Functional Calculus 

Figure 3 presents the subset of join calculus which can express purely functional 
programs. The syntax of this calculus is quite small. A term M is either a 
function application x{y) or a term with a local definition, def D ; M (we let 
X stand for a sequence xi, . . . , of names, where n > 0). A definition D is 
a single rewrite rule the form L = M . The left-hand side L of a rewrite rule is 
again a function application x(y). We require that the formal parameters yi of 
a left-hand side are pairwise disjoint. The right-hand side of a rewrite rule is an 
arbitrary term. 

The set of defined names dn(Z3) of a definition D of the form x(fj) = M con- 
sists of just the function name x. Its local names ln(D) are the formal parameters 
y. The free names fn(M) of a term M are all names which are not defined by or 
local to a definition in M. The free names of a definition D are its defined names 
and any names which are free in the definition’s right hand side, yet different 
from the local names of D. All names occurring in a term M that are not free 
in M are called bound in M. Figure 6 presents a formal definition of these sets 
for the object-based extension of join calculus. 

To avoid unwanted name capture, where free names become bound inadver- 
tently, we will always write terms subject to the following hygiene condition: We 
assume that the set of free and bound names of every term we write are disjoint. 
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This can always be achieved by a suitable renaming of bound variables, accor- 
ding to the a-renaming law. This law lets us rename local and defined names of 
definitions, provided that the new names do not clash with names which already 
exist in their scope. It is formalized by two equations. First, 

def x{y) = M ] N = def u{y) = [u/x]M ; [u/x]N 
if u ^ fn(M) U fn(fV). Second, 

def x{y) = M ] N = def x{v) = [v/y]M ; N 

if {u} nfn(M) = 0 and the elements of v are pairwise disjoint. Here, [u/x\ and 
[v/y] are substitutions which map x and y to u and v. Generally, substitutions 
are idempotent functions over names which map all but a finite number of names 
to themselves. The domain dom(CT) of a substitution a is the set of names not 
mapped to themselves under a. 

Generally, we will give in each case a structural equivalence relation = which 
is assumed to be reflexive, transitive, and compatible (i.e. closed under formation 
of contexts). Terms that are related by = are identified with each other. For the 
purely functional calculus, = is just a-renaming. Extended calculi will have richer 
notions of structural equivalence. 

Execution of terms in our calculus is defined by rewriting. Figure 3 defines 
a single rewrite rule, which is analogous to /3-reduction in A-calculus. The rule 
can be sketched as follows: 

def x{y) = M ; ... x{z) ... — def x{y) = M ; ... [z/y]M . . . 

That is, if there is an application x{z) which matches a definition of x, say 
x{y) = M, then we can rewrite the application to the definition’s right hand 
side M, after replacing formal parameters y by actual arguments z. 

The above formulation is not yet completely precise because we still have to 
specify where exactly a reducible application can be located in a term. Glearly, 
the application must be within the definition’s scope. Also, we want to reduce 
only those applications which are not themselves contained in another local 
definition. For instance, in 

def f (x, k) = k X ; 

def g (x, k) = f (1, k) ; 

f(2, k) 

we want to reduce only the second application of f, not the first one which is 
contained in the body of function g. This restriction in the choice of reducible 
applications avoids potentially unnecessary work. For instance in the code frag- 
ment above g is never called, so it would make no sense to reduce its body. More 
importantly, once we add side-effects to our language, it is essential that the 
body of a function is executed (i.e. reduced) only when the function is applied. 
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The characterization of reducible applications can be formalized using the 
idea of a reduction context^ . A context C is a term with a hole, which is written 
[ ]. The expression C[M] denotes the term resulting from filling the hole of the 
context C with M . A reduction context i? is a context of a special form, in which 
the hole can be only at places where a function application would be reducible. 
The set of possible reduction contexts for our calculus is generated by a simple 
context free grammar, given in Figure 3. This grammar says that reduction can 
only take place at the top of a term, or in the scope of some local definitions. 

Reduction contexts are used in the formulation of the reduction law in Fi- 
gure 3. Generally, we let the reduction relation — >■ between terms be the smallest 
compatible relation that contains the reduction law. 

An alternative formulation of the reduction rule abstracts from the concrete 
substitution operator: 

def L = M ; R[aL] def L = M ; R[aM] 

if (T is a substitution from names to names with dom(cr) C ln(T). 

The advantage of the alternative formulation is that it generalizes readily to 
the more complex join patterns which will be introduced in the next sub-section. 

As an example of functional reduction, consider the following forwarding 
function: 

def f(x) = g(x) ; f(y) def f(x) = g(x) ; g(y) 

A slightly more complex example is the following reduction of a call to an eva- 
luation function, which takes two arguments and applies one to the other: 

def apply(f,x) = f(x) ; apply(print, 1) def apply(f,x) = f(x) ; print(l) 



6.2 Canonical Join Calculus 

Figure 4 presents the standard version of join calculus. Compared to the purely 
functional subset, there are three syntax additions: First and second, &l is now 
introduced as fork operator on terms and as a join operator on left-hand sides. 
Third, definitions can now consist of more than one rewrite rule, so that multiple 
definitions of the same function symbol are possible. 

The latter addition is essentially for convenience, as one can translate every 
program with definitions consisting of multiple rewrite rules to a program that 
uses just one rewrite rule for each definition [15]. The convenience is great enough 
to warrant a syntax extension because the encoding is rather heavy. 

The notion of structural equivalence is now more refined than in the purely 
functional subset. Besides a-renaming, there are three other sets of laws which 
identify terms. First, the fork operator is assumed to be associative and commu- 
tative. Second, the comma operator which conjoins rewrite rules is also taken to 

^ The concept is usually known as under the name “evaluation context” [11], but 
there’s nothing to evaluate here. 




18 



M. Odersky 



Syntax: 

Names a, b, c, 

Terms M, N 

Definitions D 

Left-hand sides L 



Reduction contexts R 



x,y,z 

defD ; M | x(y) \ M &i N 
M \ D,D \ e 
x{y) I L k, L 

[] I defD -,R\RkM\MkR 



Structurai Equivaience: a-renaming + 



1. (&) on terms is associative and commutative: 

Ml & M2 = M2 & Ml 
Ml & (M2 & Ms) = (Ml & M2) & M3 

2. (, ) on definitions is associative and commutative with e as an identity: 

Di, D2 = D2, Di 
Di, (D2, D3) = (Di, D2), D3 
D,e = D 

3. Scope extrusion: 

(def D-M)kN = defD-{MkN) if dn(D) n fn(N) = 0. 



Reduction: 

def D, L = M ; R\aL] def D, L = M ; R[crM] 

where <t is a substitution from names to names with dom((j) C ln(L). 
Fig. 4. Canonical join calculus 



be associative and commutative, with the empty definition e as identity. Finally, 
we have a scope extrusion law, which states that the scope of a local definition 
may be extended dynamically over other operands of a parallel composition, 
provided this does not lead to clashes between names bound by the definition 
and free names of the terms that are brought in scope. 



There is still just one reduction rule, and this rule is essentially the same as 
in the functional subset. The major difference is that now a rewrite step may 
involve sets of function applications, which are composed in parallel. 



The laws of structural equivalence are necessary to bring parallel subterms 
which are “far apart” next to each other, so that they can match the join pattern 
of left-hand side. For instance, in the following example of semaphore synchro- 
nization two structural equivalences are necessary before rewrite steps can be 
performed. 
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Syntax: 



Names 


a, b, c, . 


x,y,z 


Identifiers 


I,J 


= X 1 I.x 


Terms 


M,N 


= defD ; M | I{J) | M & A 


Definitions 


D 


= L ^ M \ D,D \ e 


Left-hand sides 


L 


= m \ LkL 


Reduction contexts 


R 


= [] 1 defD ■R\R&lM\M&lR 



Structural Equivalence: a-renaming + 

1. (&) on terms is associative and commutative: 

Ml & M2 = M2 & Ml 
Ml & (M2 & M3) = (Ml & M2) & Ms 

2. (, ) on definitions is associative and commutative with e as an identity: 

D\,D2 = D 2 , Di 
D\, {D2, Ds) = {Di, D2), -D3 
D,e = D 

3. Scope extrusion: 

(def D-M)kN = defD-{MkN) if dn{D) n fn(iV) = 0. 

Reduction: 

def L», L = M ; R[aL] def D,L = M ■ R[aM] 

where <t is a substitution from names to identifiers with dom((r) C ln(L). 
Fig. 5. Object-based join calculus 



def getLock(k) &l releaseLock() = k(); 
releaseLock() & (def k’() = f() & g(); getLock(k’)) 

= (by commutativity of &) 

def getLock(k) &l releaseLock() = k(); 

(def k’O = f() & g(); getLock(k’)) & releaseLock() 

= (by scope extrusion) 

def getLock(k) releaseLock() = k(); 

def k’O = f() &L g(); getLock(k’) releaseLock() 

— > def getLock(k) & releaseLock() = k(); def k’() = f() & g(); k’() 

— > def getLock(k) releaseLockQ = k(); def k’Q = f() g(); f() & g() 



6.3 Object-Based Calculus 

Figure 5 presents the final stage of our progression, object-based join calculus. 
The only syntactical addition with respect to Figure 4 is that identifiers can now 
be qualified names. A qualified name I is either a simple name cc or a qualified 
name followed by a period and a simple name. Qualified names can appear as 
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first(®) =x \n{I{x\, . . . , Xn)) = {xi, . . . , Xn} 

first(7./) = first(/) ln(Li&L 2 ) = In(Li) U ln(L 2 ) 



dn(/(i) 
dn(Li&L2) 
dn(Z)i, D 2 ) 



first (/) 

dn(Li) U dn(L 2 ) 
dn(_Di) U dn(_D 2 ] 



fn(7(Ji, 
fn(defT) ; M) 
fn(Mi&M2) 
fn(L = M) 
fn(77i,7)2) 



Jn)) = {first(7),first(Ji), . . . , first(J„)} 



= (fn(77) Ufn(M))\dn(77) 
= fn(Mi) U fn(M 2 ) 

= dn(L) U (fn(M)\ln(L)) 

= fn(77i)Ufn(7)2) 



Fig. 6. Local, defined, aiand free names 



the operands of a function application and as defined function symbols in a 
definition. 

Perhaps surprisingly, this is all that changes! The structural equivalences and 
reduction rules stay exactly as they were formulated for canonical join calculus. 
However, a bit of care is required in the definition of permissible renamings. For 
instance, consider the following object-based functional net: 

def this.f(k) & g(x) = k(x) ; k’(this) & g(0) 

In this net, both this and g can be consistently renamed. For instance, the follo- 
wing expression would be considered equivalent to the previous one: 

def that.f(k) & h(x) = k(x) ; k’(that) & h(0) 

On the other hand, the qualified function symbol f cannot be renamed without 
changing the meaning of the expression. For instance, renaming f to e would 
yield: 

def this.e(k) & g(x) = k(x) ; k’(this) & g(0) 

This is clearly different from the expression we started with. The new expression 
passes a record with an e field to the continuation function k’, whereas the 
previous expressions passed a record with an f field. 

Figure 6 reflects these observations in the definition of local, defined, and free 
names for object-based join calculus. Note that names occurring as held selectors 
are neither free in a term, nor are they defined or local. Hence a-renaming does 
not apply to record selectors. 

The a-renaming rule is now formalized as follows. Let a renaming 0 be a sub- 
stitution from names to names which is injective when considered as a function 
from dom(0) (remember that dom(0) = {x \ 0{x) yf x}). Then, 

def 77 ; M = def 9D ; 9M 

if 6* is a renaming with dom(6*) C dn(77) and codom(0) fl (fn(7?) U fn(M)) = 0. 
Furthermore, 



def 77, L = M ; IV = def D,9L = 9M ; N 




Functional Nets 



21 



Silk program: 

def newChannel = ( def this. read this.write(x) = x ; this ); 

val chan = newChannel; 
chan. read & chan.write(l) 



Join calculus program and its reduction: 

def newChannel(ki) = (def this.read(l<2) & this.write(x) = k2(x); ki(this)); 

def k3(chan) = chan.read(ko) & chan.write(l); 

newChannel(k3) 

— > 

def newChannel(ki) = (def this.read(k2) & this.write(x) = k2(x); ki(this)); 
def k3(chan) = chan.read(ko) chan.write(l); 
def this'.read(k2) this'.write(x') = k^x'); 
k3(this’) 

— >• 

def newChannel(ki) = (def this.read(k2) & this.write(x) = k2(x); ki(this)); 
def k3(chan) = chan.read(ko) chan.write(l); 
def this'.read(k2) this'.write(x') = k^x'); 
this’.read(ko) & this’.write(l); 



def newChannel(ki) = (def this.read(k2) & this.write(x) = k2(x); ki(this)); 
def k3(chan) = chan.read(ko) chan.write(l); 
def this'.read(k2) this'.write(x') = k^x'); 
ko(l) 



Fig. 7. Reduction involving an asynchronous channel object 



if 0 is a renaming with dom(6l) C ln(L) and codom(0) Cl fn(M) = 0. 

The definitions of Figure 6 and the a-renaming rule apply as stated to all 
three versions of join calculus, not only to the final object-based version. When 
reduced to the simpler syntax of previous calculi, the new definitions are equi- 
valent to the old ones. 

As an example of object-based reduction consider the Silk program at the 
top of Figure 7. The program defines an asynchronous channel using function 
newChannel and then reads and writes that channel. 

This program is not yet in the form mandated by join calculus since it uses 
a synchronous function and a val definition. We can map this program into join 
calculus by adding continuation functions which make control flow for function 
returns and value definitions explicit. The second half of Figure 7 shows how this 
program is coded in object-based join calculus and how it is reduced. Schemes 
which map from our programming notation to join calculus are further discussed 
in the next section. 
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7 Syntactic Abbreviations 

Even the extended calculus discussed in the last section is a lot smaller than the 
Silk programming notation we have used in the preceding sections. This section 
fills the gap, by showing how Silk constructs which are not directly supported in 
object-based join calculus can be mapped into equivalent constructs which are 
supported. 



Direct style An important difference between Silk and join calculus is that Silk 
has synchronous functions and val definitions which bind the results of synchro- 
nous function applications. To see the simplifications afforded by these additions, 
it suffices to compare the Silk program of Figure 7 with its join calculus counter- 
part. The join calculus version is much more cluttered because of the occurrence 
of the continuations ki. Programs which make use of synchronous functions and 
value definitions are said to be in direct style, whereas programs that don’t are 
said to be in continuation passing style. Join calculus supports only continua- 
tion passing style. To translate direct style programs into join calculus, we need 
a continuation passing transform. This transformation gives each synchronous 
function an additional argument which represents a continuation function, to 
which the result of the synchronous function is then passed. 

The source language of the continuation passing transform is object-based 
join calculus extended with result expressions (li, ..., I„) and value definitions 

val (xi x„) — M ; N. Single names in results and value definitions are also 

included as they can be expressed as tuples of length 1. 

For the sake of the following explanation, we assume different alphabets for 
synchronous and asynchronous functions. We let I® range over identifiers whose 
final selector is a synchronous function, whereas /“ ranges over identifiers whose 
final selector is an asynchronous function. In practice, we can distinguish between 
synchronous and asynchronous functions also by means of a type system, so that 
different alphabets are not required. 

Our continuation passing transform for terms is expressed as a function TC 
which takes a term in the source language and a name representing a continuation 
as arguments, mapping these to a term in object-based join calculus. It makes 
use of a helper function TD which maps a definition in the source language 
to one in object-based join calculus. To emphasize the distinction between the 
transforms TC, TD and their syntactic expression arguments, we write syntactic 
expressions in [[ ]] brackets. The transforms are defined as follows. 



TC[[ val (x) = M ; N ]]k 

TC[[ (li I„) ]]k 

TC[[ P(Ji J„) ]]k 

TC[[ r(Ji J„) ]]k 

TC[[ def D ; M ]]k 



def k’ (x) = TC[[ N ]]k ; TC[[ M ]]k’ 

k(li I„) 

l®(Ji J„, k) 

l“(Ji Jn) 

def TD[[ D ]] ; TC[[ M ]]k 



TD[[ L = M 
TD[[ D, D’ 
TD[[e 



TC[[ L ]]k’ = TC[[ M ]]k’ 
TD[[ D ]], TD[[ D’ ]] 
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Here, the k’ in the first equations for TC and TD represent fresh continuation 
names. 

The original paper on join [15] defines a different contination passing trans- 
form. That transform allows several functions in a join pattern to carry results. 
Consequently, in the body of a function it has to be specified to which of the 
functions of a left hand side a result should be returned to. The advantage of 
this approach is that it simplifies the implementation of rendevouz situations 
like the synchronous channel of Section 5. The disadvantage is a more complex 
construct for function returns. 

Structured Terms In Silk, the function part and arguments of a function ap- 
plication can be arbitrary terms, whereas join calculus admits only identifiers. 
Terms as function arguments can be expanded out by introducing names for 
intermediate results. 

M (Ni Nfc) ^ val X = M; val yi = Ni; ... val = N^; x(yi y^) 

The resulting expression can be mapped into join calculus by applying the con- 
tinuation passing transform TC. The same principle is also applied in other si- 
tuations where structured terms appear yet only identifiers are supported. E.g.: 

(Ml, ..... Mfe) ^ val xi = Mi; ... val Xfc = Mfei (xi Xfc) 

M.f ^ val X = M; x.f 

We assume here that names in the expanded term which are not present in the 
original source term are fresh. 



8 Conclusion and Related Work 

The first five sections of this paper have shown how a large variety of program 
constructs can be modelled as functional nets. The last two sections have shown 
how functional nets themselves can be expressed in object-based join calculus. 
Taken together, these steps constitute a reductionistic approach, where a large 
body of notations and patterns of programs is to be distilled into a minimal 
kernel. The reduction to essentials is useful since it helps clarify the meaning of 
derived program constructs and the interactions between them. 

Ever since the inception of Lisp [26] and Landin’s ISWIM [25], functional 
programming has pioneered the idea of developing programming languages from 
calculi. Since then, there has been an extremely large body of work which aims to 
emulate the FP approach in a more general setting. One strand of work has devi- 
sed extensions of lambda calculus with state [13,34,36,28,3] or non-determinism 
and concurrency [7,12,9]. Another strand of work has been designed concurrent 
functional languages [19,33,2] based on some other operational semantics. Lan- 
din’s programme has also been repeated in the concurrent programming field, 
for instance with Occam and CSP [22], Piet [30] and 7r-calculus [27], or Oz and 
its kernel [35]. 
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Our approach is closest to the work on join calculus [15,16,17,14]. Largely, 
functional nets as described here constitute a simplification and streamlining 
of the original treatment of join, with object-based join calculus and qualified 
definitions being the main innovation. 
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Abstract. Recent work has shown equivalences between various type 
systems and flow logics. Ideally, the translations upon which such equi- 
valences are based should be faithful in the sense that information is 
not lost in round-trip translations from flows to types and back or from 
types to flows and back. Building on the work of Nielson & Nielson 
and of Palsberg & Pavlopoulou, we present the first faithful translations 
between a class of Unitary polyvariant flow analyses and a type system 
supporting polymorphism in the form of intersection and union types. 
Additionally, our flow/type correspondence solves several open problems 
posed by Palsberg & Pavlopoulou: (1) it expresses call-string based po- 
lyvariance (such as k-CFA) as well as argument based polyvariance; (2) 
it enjoys a subject reduction property for flows as well as for types; and 
(3) it supports a flow-oriented perspective rather than a type-oriented 
one. 



1 Introduction 

Type systems and flow logic are two popular frameworks for specifying program 
analyses. While these frameworks seem rather different on the surface, both de- 
scribe the “plumbing” of a program, and recent work has uncovered deep connec- 
tions between them. For example, Palsberg and O’Keefe [P095] demonstrated 
an equivalence between determining flow safety in the monovariant 0-CFA flow 
analysis and typability in a system with recursive types and subtyping [AC93] . 
Heintze showed equivalences between four restrictions of 0-CFA and four type 
systems parameterized by (1) subtyping and (2) recursive types [Hei95]. 

Because they merge flow information for all calls to a function, monovari- 
ant analyses are imprecise. Greater precision can be obtained via polyvariant 
analyses, in which functions can be analyzed in multiple abstract contexts. Ex- 
amples of polyvariant analyses include call-string based approaches, such as 
k-CFA [Shi91,JW95,NN97], polymorphic splitting [WJ98], type-directed flow 
analysis [JWW97], and argument based polyvariance, such as Schmidt’s ana- 
lysis [Sch95] and Agesen’s cartesian product analysis [Age95]. In terms of the 

* Both authors were supported by NSF grant EIA-9806747. This work was conducted 
as part of the Church Project (http://www.cs.bu.edu/groups/church/). 
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flow/type correspondence, several forms of flow polyvariance appear to cor- 
respond to type polymorphism expressed with intersection and union types 
[Ban97,WDMT97,DMTW97,PP99]. Intuitively, intersection types are finitary 
polymorphic types that model the multiple analyses for a given abstract closure, 
while union types are finitary existential types that model the merging of ab- 
stract values where flow paths join. Palsberg and Pavlopoulou (henceforth P&P) 
were the first to formalize this correspondence by demonstrating an equivalence 
between a class of flow analyses supporting argument based polyvariance and a 
type system with union and intersection types [PP99]. 

If type and flow systems encode similar information, translations between 
the two should be faithful, in the sense that round-trip translations from flow 
analyses to type derivations and back (or from type derivations to flow analyses 
and back) should not lose precision. Faithfulness formalizes the intuitive notion 
that a flow analysis and its corresponding type derivation contain the same infor- 
mation content. Interestingly, neither the translations of Palsberg and O’Keefe 
nor those of P&P are faithful. The lack of faithfulness in P&P is demonstrated 
by a simple example. Let e= (A^x.succ x) @ ((A^y.y) @ 3), where we have labeled 
two program points of interest. Consider an initial monovariant flow analysis in 
which the only abstract closure reaching point 1 is = (Ax.succ x, []) and the 
only one reaching point 2 is U 2 = (Ay.y, []). The how-to-type translation of P&P 
yields the expected type derivation: 

[] h A^y.y : int — > int • • • 

[] h A^x.succ X : int — >■ int [] h (A^y.y)@3 : int 
[] h (A^x.succ x) @ ((A^y.y) @ 3) : int 

However, P&P’s type-to-how translation loses precision by merging into a 
single set all abstract closures associated with the same type in a given derivation. 
For the example derivation above, the type int — >■ int translates back to the 
abstract closure set V = {^ 1 ,^ 2 }) yielding a less precise how analysis in which 
V hows to both points 1 and 2. In contrast, Heintze’s translations are faithful. 
The undesirable merging in the above example is avoided by annotating function 
types with a label set indicating the source point of the function value. Thus, 

\1 1 1*1x9 1 

A x.succ X has type int — >• int whiie A y.y has type int — >• int. 

In this paper, we present the hrst faithful translations between a broad class 
of polyvariant how analyses and a type system with polymorphism in the form 
of intersection and union types. The translations are faithful in the sense that a 
round-trip translation acts as the identity for canonical types/hows, and other- 
wise canonicalizes. In particular, our round-trip translation for types preserves 
non-recursive types that P&P may transform to recursive types. We achieve 
this result by adapting the translations of P&P to use a modihed version of the 
how analysis framework of Nielson and Nielson (henceforth N&N) [NN97]. As 
in Heintze’s translations, annotations play a key role in the faithfulness of our 
translations: we (1) annotate how values to indicate the sinks to which they how, 
and (2) annotate union and intersection types with component labels. These an- 
notations can be justihed independently of the how/type correspondence. 
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Additionally, our framework solves several open problems posed by P&P: 

1. Unifying P&P and N&N: Whereas P&P’s flow specification can readily 
handle only argument based polyvariance, N&N’s flow specification can also 
express call-string based polyvariance. So our translations give the first type 
system corresponding to k-CFA analysis where fc > 1. 

2. Subject reduction for flows: We inherit from N&N’s flow logic the property 
that flow information valid before a reduction step is still valid afterwards. 
In contrast, P&P’s flow system does not have this property. 

3. Letting “flows have their way”: P&P discuss mismatches between flow and 
type systems that imply the need to choose one perspective over the other 
when designing a translation between the two systems. P&P always let types 
“have their way”; for example they require analyses to be flnitary and to 
analyze all closure bodies, even though they may be dead code. In contrast, 
our design also lets flows “have their way” , in that our type system does not 
require all subexpressions to be analyzed. 

Due to space limitations, the following presentation is necessarily somewhat 
dense. Please see the companion technical report [ATOO] for a more detailed 
exposition with additional explanatory text, more examples, and proofs. 



2 The Language 

We consider a language whose core is A-calculus with recursion: 

ue G UnLabExpr ::= z \ yLf.Xx.e | e @ e | c | succ e | if 0 e then e else e | . . . 
e G LabExpr ::= ue^ I G Lab t G Var ::= x \ f x G NVar / G RVar 

fxf.Xx.e denotes a function with parameter x which may call itself via /; Xx.e is 
a shorthand for ixf.Xx.e where / does not occur in e. Recursive variables (ranged 
over by /) and non-recursive variables (ranged over by x) are distinct; z ranges 
over both. There are also integer constants c, the successor function, and the 
ability to test for zero. Other constructs might be added, e.g., let^. 

All subexpressions have integer labels. We often write labels on constructors 
(e.g., write X^x.e for {Xx-eY and Ci @i €2 for (ci @ 62 )*). 

Example 1. The expression = (A®g.((g^ @2 g"^) @1 0^)) @0 (A®x.x^) shows the 
need for polyvariance: A®x.x^ is applied both to itself and to an integer. 

Like N&N, but unlike P&P, we use an environment-based small step se- 
mantics. This requires incorporating N&N’s bind and close constructs into our 
expression syntax. An expression not containing bind or close is said to be pure. 
Every abstraction body must be pure. A program P is a pure, closed expression 
where each label occurs at most once within P; thus each subexpression of P (g 
SubExpr p) denotes a unique “position” within P. 

^ Let-polymorphism can be simulated by intersection types. 
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3 The Type System 

Types are built from base types, function types, intersection types, and union 
types as follows (where ITag and UTag are unspecified): 

t G ElementaryType ::= int | : m — >■ u'} 

u G UnionType ::= 

K G 'P(ITag) k G ITag q G UTag 

Such grammars are usually interpreted inductively, but this one is to be viewed 
co-inductively. That is, types are regular (possibly infinite) trees formed accor- 
ding to the above specification. Two types are considered equal if their infinite 
unwindings are equal (modulo renaming of the index sets I). 

An elementary type t is either an integer int or an intersection type of the 
form : Ui — >■ m'}, where / is a (possibly empty) finite index set, each Ui 

and m' is a union type, and the AT^’s, known as I-tagsets, are non-empty disjoint 
sets of I-tags. We write dom{t) for Uig/ATi. Intuitively, if an expression e has 
the above intersection type then for all i G I it holds that the expression maps 
values of type Ui into values of type u'. 

A union type u has the form Vig/{9i • h}, where / is a (possibly empty) 
finite index set, each ti is an elementary type, and the Qi are distinct U-tags. We 
write dom{u) for Uig/jgi}, and u.q = t if there exists i G I such that q = qi and 
t = ti. We assume that for all i G / it holds that A = int iff qi = gi„t where <7i„t 
is a distinguished U-tag. Intuitively, if an expression e has the above union type 
then there exists an f G / such that e has the elementary type ti. 

If / = {1 • • • n} (n > 0), we write \J (qi : h, ■ ■ ■ , qn : t„) for ■ 9i} and 

write /\{Ki : ui -G u'l,--- , AT„ : — >■ mJ^) for f\j^^j{Ki : Ui — >■ m(}. We write 

■Mint for V(®nt : int). 

The type system is much as in P&P except for the presence of tags. These 
annotations serve as witnesses for existentials in the subtyping relation and play 
crucial roles in the faithfulness of our flow/type correspondence. U-tags track 
the “source” of each intersection type and help to avoid the precision-losing 
merging seen in P&P’s type-to-flow translation (cf. Sect. 1). I-tagsets track the 
“sinks” of each arrow type and help to avoid unnecessary recursive types in the 
flow-to-type translation. 



3.1 Subtyping 

We define an ordering < on union types and an ordering </^ on elementary types, 
where u<u' means that u' is less precise than u and similarly for </^. To capture 
the intuition that something of type t\ has one of the types t\ or t 2 , < should 
satisfy V(9i • ^i) < V(9i • ^i )92 : ^ 2 )- For </^, we want to capture the following 
intuition: a function that can be assigned both types ui -G u'l and U 2 -G u '2 
also (1) can be assigned one of them^ and (2) can be assigned a function type 

^ I.e., for i G {1, 2}, f\{Ki : m ->• wj, A '2 : U 2 u' 2 ) : m ->• «'). 
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that “covers” both^. The following mutually recursive specification of < and <a 
formalizes these considerations: 

Vie/{* : ■■ t'j} 

iff for alH G / there exists j G J such that Qi = q'j and <a tj 
int <A int 

/\iei{Ki ■■ u, u'} <A ^ ^ <'} 

iff for all j G J there exists Iq Q I such that 
Kj = Ujg/u ATi and G Iq. u'^< u”' and 
Vg G dom{Uj).3i G Iq. qGdom{ui) wd u” .q<r^Ui.q. 

Observe that if t <a t', then dom{t') C dom{t). The above specification is not yet 
a definition of < and <a, since types may be infinite. However, it gives rise to a 
monotone functional on a complete lattice whose elements are pairs of relations; 
< and <A are then defined as the (components of) the greatest fixed point of 
this functional. Coinduction yields: 

Lemma 1. The relations < and <a are reflexive and transitive. 

Our subtyping relation differs from P&P’s in several ways. The U-tags and 
I-tags serve as “witnesses” for the existential quantifiers present in the specifica- 
tion, reducing the need for search during type checking. Finally, our < seems 
more natural that the P&P’s <i , which is not a congruence and in fact has the 
rather odd property that if V(Ti, T2) <i V(T3, T4) (with the Pi’s all distinct), 
then either V(Ti, T2) <1 or V(Ti, T2) <1 T4. 



3.2 Typing Rules 

A typing T for a program P is a tuple {P,ITt, UTt, Dt), where ITt is a finite 
set of I-tags, UTt is a finite set of U-tags, and Dt is a derivation of [] h P : u 
according to the inference rules given in Fig. 1 . In a judgement A e : u, A is 
an environment with bindings of the form [z u]; we require that all I-tags in 
Dt belong to ITt and that all U-tags in Dt belong to UTt. 

Subtyping has been inlined in all of the rules to simplify the type/flow cor- 
respondence. The rules for function abstraction and function application are 
both instrumented with a “witness” that enables reconstructing the justification 
for applying the rule. In [app]™ , the type of the operator is a (possibly empty) 
union, all components of which have the expected function type but the I-tagsets 
may differ; the a pp -witness w® is a partial mapping from dom{ui) that given 
q produces the corresponding I-tagset. In [fun]™ , the function types resulting 
from analyzing the body in several different environments are combined into an 
intersection type t. This is wrapped into a union type with an arbitrary U-tag q, 

® I.e., /\{K\ : Ml — >• u'i,A'2 : M2 — >■ M2) </\f\{K\ VJ K2 ■ M12 — >■ M'12), where any value 
having one of the types nj or u'2 also has type m'i 2, and where any value having type 
M12 also has one of the types ui or M2. 
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[var] A \- : u if A{z) < u 

, ,, yk £ K : A[f !->• Mfc , X Uk] \- e : u'k At = • Wfc — >■ Mfc} 

[fun](^^*> ^ Ay{q-.t)<u 

A h nf.X'-x.e : u /\yk € K. \/{q :t)< < 

Q A h ei : Ml A h 62 : U2 ^ 

[app]™ iiVq £ dom{ui).ui.q<^ /\{w®{q) : U2 ^ u) 

j 4 h ei @i 62 : M 

[con] A \- c’' ■. U if Mint < M 



A h 6i : Ml 

[sue] ^ if Ml < Mint < M 

A h succ 6i : M 

^ h 6o : Mo A h 6i : Ml A h 62 : M2 

[if] j if Mo < Mint A Ml < M A M2 < M 

A h ifO 6o then 6i else 62 : M 



Fig. 1. The typing rules 

Ag h g® : Mx Ag h : Mi 

Ag h g® @2 : Mi Ag h 0® : Mint 

I” (g^ ®2 g"^) @1 0 ® : Mint ylx F x’^ ; Mint ^i h x’^ : Mi 

[] F A®g.((g® @2 g"*) @1 0 ®) : Mg [] F A^x.x'^ : Mx 

[] F (A®g.((g® @2 g"') @1 0 ®)) @0 (A®x.x’^) : Mint 
Fig. 2. A derivation D. ^ for the program from Example 1. 



which provides a way of keeping track of the origin of a function type (cf. Sects. 1 
and 5). Accordingly, the fun-witness of this inference is the pair (q : t). Note 
that K may be empty in which case the body is not analyzed. 

Example 2. For the program from Ex. 1, we can construct a typing as 

follows: IT. ^ = {0, 1, 2}, VT. = {gx, 9g}, and D. ^ is as in Fig. 2, where 

ttx“V(^x ■ A({^} ■ ttint ^ ttint)) 

Mx — \/(^7x ■ A({^} ■ Mint ^ Mint? {2} . ^ Mx)) 

Mg = V {% ■ A({^} ■ Mx Mint)) 

= [g ^ Mx] Ax = [x H> Mint] Ai = [x H> Mi] 

Note that Mx < Mi, and that Mx.(?x <a A({2} : Mi — >■ Mi) so that {^x '->■ {2}} is 
indeed an app-witness for the inference at the top left of Fig. 2. 

The type system in Fig. 1 can be augmented with rules for bind and close 
such that the resulting system satisfies a subject reduction property. The so- 
undness of the type system follows from subject reduction, since “stuck” expres- 
sions (such as 7 @ 9) are not typable. 

In a typing T for P, for each e in SubExpr p there may be several judgements 
for e in Dp, due to the multiple analyses performed by [fun]. We assign to each 
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judgement J for e in Dt an environment ke (its address) that for all applications 
of [fun] in the path from the root of Dt to J associates the bound variables with 
the branch taken. In D. (Fig. 2), the judgement F : Ui^t has address 
[x !->• 1] and the judgement A], h x’^ : u'^ has address [x >->■ 2]. 

The translation in Sect. 5 requires that a typing must be uniform, i.e., the 
following partial function At must be well-defined: At{z, k)=u iS Dt contains 
a judgement of the form A \- e : u' with address ke, where ke{z) = k and 
A(z) = u. For we have, e.g., A. (x, 1) = Mint and A. ^ (x, 2) = m],. 



4 The Flow System 

Our system for flow analysis has the form of a flow logic, in the style of N&N. 
A flow analysis F for program P is a tuple {P, MemF,Cp, whose com- 

ponents are explained below (together with some auxiliary derived concepts). 

Polyvariance is modeled by mementoes, where a memento (m G MemF) 
represents a context for analyzing the body of a function. We shall assume 
that MemF is non-empty and finite; then all other entities occurring in F will 
also be finite. Each expression e is analyzed wrt. several different memento en- 
vironments, where the entries of a memento environment (me G MemEnv f) 
take the form [z i— >■ m] with m in MemF- Accordingly, a flow configuration 
(G FlowConf f) is a pair (e,me), where FV{e) C dom{me). 

The goal of the flow analysis is to associate a set of flow values to each 
configuration, where a flow value (v G FlowVal f) is either an integer Int or of the 
form (ac, M), where ac ( G AbsClos f) is an abstract closure of the form (/n, me) 
with fn a function pf.Xx.e and FV (fn) C dom{me), and where M C MemF- The 
M component can be thought of a superset of the “sinks” of the abstract closure 
ac, i.e. the contexts in which it is going to be applied. Our flow values differ from 
N&N’s in two respects: (i) they do not include the memento that corresponds 
to the point of definition; (ii) they do include the mementoes of use (the M 
component), in order to get a flow system that is almost isomorphic to the 
type system of Sect. 3. This extension does not make it harder to analyze an 
expression, since one might just let M = MemF everywhere. 

A flow set V ( G FlowSetF) is a set of flow values, with the property that if 
{ac. Ml) G V and (ac, M2) G V then Mi = M2- We define an ordering on FlowSetF 
by stipulating that Pi <y iff for all v\ G Vi there exists V 2 G V 2 such that 
vi V2, where the ordering <„ on FIowVuIf is defined by stipulating that 
Int <„ Int and that (ac. Mi) <y (ac, M2) iff M 2 C Mi. Note that if V\ <v P 2 then 
V2 is obtained From V\ by adding some “sources” and removing some “sinks (in 
a sense moving along a “flow path” from a source to a sink), so in that respect 
the ordering is similar to the type ordering in [WDMT97]. 

d^F is a partial mapping from (Labsp x MemEnv f) x AbsClos f to 'P(MemF), 
where Labs p is the set of labels occurring in P. Intuitively, if the abstract closure 
ac in the context me is applied to an expression with label I, then <Pf{{1, me), ac) 
denotes the actual sinks of ac. 
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Cf is a mapping from Lahsp x MemEnvp to (FlowSetp) ±- Intuitively, if 
Cpiljme) = V (yf _L) and Cp is valid (defined below) for the fiow configuration 
{ue^ , me) then all semantic values that ue^ may evaluate to in a semantic en- 
vironment approximated by me can be approximated by the set V. Similarly, 
Pf(z, m) approximates the set of semantic values to which z may be bound when 
analyzed in memento m. 

Unlike N&N, we distinguish between Cf{ 1, me) being the empty set and being 
_L. The latter means that no fiow configuration (ue\me) is “reachable”, and so 
there is no need to analyze it. The relation <y on FlowSetF is lifted to a relation 
<y on FlowSetp F- 

Example 3. For the program from Ex. 1, a fiow analysis with Mem. = 
{0, 1, 2} is given below. We have named some entities (note that Vy^ <„ wQ: 
meg=[ge^0] aCg = (Ag.- ••,[]) Wg = (aCg,{0}) 

7716x1 = [xH>l] aCx = (Ax.x'^, []) i;' =(acx,{l}) 

7776x2 = [xH> 2] Wx = (aCx, {1,2}) 

C. and p. are given by the entries below (all other are T): 

{ug} = C.j^(6, []) 

(Intj = p. (x, 1) = C.^(7,TO6xi) = C. J^(5,7776g) = C. ^(l,7776g) = C.^(0,[]) 

{^xl = P- I 2) = C. (7, 7776x2) = C. (4, 7776g) = C. ^ (2, r776g) 

(Wxl = P- ^ (g, 0) = C. (3, ?776g) = C. ^ (8, [ ]) 

Thus (g^ @2 g"') @1 0® is analyzed with g bound to 0, and is analyzed twice: 

with X bound to 1 and with x bound to 2. Accordingly, is given by 

[]),acg) = {0}, j^((5,r776g),acx) = {!}, ^ ((4, rrieg), acx) = {2}. 



4.1 Validity 

Of course, not all fiow analyses give a correct description of the program being 
analyzed. To formulate a notion of validity, we define a predicate F 1=™® e 
(to be read: F analyzes e correctly wrt. the memento environment me), with 
(e,me) € FlowConf p- The predicate must satisfy the specification in Fig. 3, 
which gives rise to a monotone functional on the complete lattice V{FlowConf p)', 
following the convincing argument of N&N, we define F 1=™® e as the greatest 
fixed point of this functional so as to be able to cope with recursive functions. 

In [/t 777], we deviate from N&N by recording me, rather than the restriction 
of me to FV{pf.Xx.eo). As in P&P, this facilitates the translations to and from 
types. In [app], the set M corresponds to P&P’s notion of cover, which in turn 
is needed to model the “cartesian product” algorithm of [Age95]. In N&N’s 
framework, M is always a singleton {777}; in that case the condition “Vu G 
Cp{l 2 ,me). ... ” amounts to the simpler “CpibiiTie) <v pp{x,m)” . 

By structural induction in ue^ we see that if F ^'"® ue’' then Cp{l, me) yf T. 
We would also like the converse implication to hold: 




34 



T. Amtoft and F. Turbak 



[var] F |="‘® a* iff _L ^ pp{z,me{z)) <v Cp{l,me) 

[fun] F 1=™® pf.X^x.eo iff {{{fif.Xx.eo,me),Memp)}<vCp{l,me) 

[app] F 1="^® wei'i @iU62*2 iff 

Cp{l,me) ^ -L A F wei*i A F ue 2 *^ A 
y{aco,Mo) &Cp{h,me) 

let M = ‘Pp{{l2,me),aco) and (pf.Xx.ueo^^,meo) = aco in 
M C Mo A Vn € Cp{l2, me). 3m £ M. {n} <v Pf{x, m) A 
Mm£ M-. F „eo*o A 

Cp(lo,meo\f, x m]) <v Cp{l, me) A 
pp{x, m) ^ A A {(aco, MerriF)} Pf)/, m) 

[con] F 1=™® c* iff Int G Cf( 1, me) 

[snc] F 1=™® succ* ei iff F ]=™® ei A Int G Cf( 1, me) 

[if] F 1="^® ifO* Co then nei*i else ne 2 *^ iff 

F 1=™® eo A F h"" wei'i A F h™-" WC2*2 A 
Cf{Ii, me) <v Cf{ 1, me) A Cf{ 12, me) <v Cf{ 1, me) 



Fig. 3. The flow logic 



Definition 1 . Let a flow analysis F for P be given. We say that F is valid iff 
(i) F ^[1 P; (ii) whenever e = ue^ £ SuhExpr p with {e,me) £ FlowConf p and 
Cp{l,me) ^ A then F [="‘® e. 

Using techniques as in N&N, we can augment Fig. 3 with rules for bind and 
close and then prove a subject reduction property for flows which for closed E 
reads: if E reduces to E' in one evaluation step and F |=[1 F then F E' . 

So far, even for badly behaved programs like P = 7 @ 9 it is possible (just as 
in N&N) to find a F for P such that F is valid. Since our type system rejects 
such programs, we would like to filter them out: 

Definition 2 . Let a flow analysis F for P be given. We say that F is safe 
iff for all ue’’ in SubExprp and for all me it holds: (i) if ue = uefl'^ @62 then 
Int ^ Cp{li,me); (ii) if ue = succ uefl'^ then v £ Cp{li,me) implies v = Int; (iii) 
ifue=±fO then ei else 62 then v £ Cp{lo,me) implies v = Int. 

Example f. Referring back to Example 3, it clearly holds that is safe, and it 
is easy (though a little cumbersome) to verify that is valid. 

4.2 Taxonomy of Flow Analyses 

Two common categories of flow analyses are the “call-string based” (e.g., [Shi91]) 
and the “argument-based” (e.g., [Sch95,Age95]). Our descriptive framework can 
model both approaches (which can be “mixed”, as in [NN99]). 

A flow analysis F for P such that F is valid is in CallStringp , where /? is a 
mapping from Labs pX MemEnv p into Memp, iff whenever <Pp{{l2, me),ac) is de- 
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fined it equals me)} where I is such that"* Ci @i g SubExprp. All fc-CFA 
analyses fit into this category: for 0-CFA we take Memp = {•} and /?(/, me) = •; 
for 1-CFA we take Memp = Labs p and /3(Z, me) = l\ and for 2-CFA (the genera- 
lization to A: > 2 is immediate) we take Memp = Labsp U {Labsp x Labsp) and 
define P{l,me) as follows: let it be I if me = [], and let it be {l,h) if me takes 
the form me'[z m] with m either li or (li, l^). 

A flow analysis F for P such that F is valid is in ArgBased^ iff for all non- 
recursive variables x and mementoes m it holds that whenever pp(x,m) yf _L 
then ev{pp{x,m)) = a{m) where ey removes the M component of a flow value. 
For this kind of analysis, a memento m essentially denotes a set of abstract closu- 
res. To more precisely capture specific argument-based analyses, such as [Age95] 
or the type-directed approach of [JWW97], we may impose further demands on a. 

Example 5. The flow analysis is a 1-CFA and also in ArgBasedo^ , with o;(0) = 
a(2) = {aCx} and a(l) = {Int}. 

Given a program P, it turns out that for all (3 the class CallString^ , and for 
certain kinds of a also the class ArgBased^, contains a least (i.e., most precise) 
flow analysis; here the ordering on flow analyses is defined pointwise® on Cp, pp 
and <Pp. This is much as in N&N where for all total and deterministic “instan- 
tiators” the corresponding class of analyses contains a least element, something 
we cannot hope for since we allow <Pp to return a non-singleton. 



4.3 Reachability 

For a flow analysis F, some entries may be garbage. To see an example of this, 
suppose that pf.Xx.ue^ in SubExprp, and suppose that pp{x,m) = T for all 
m G Memp. From this we infer that the above function is never calle so for all 
me the value of Cp{l,me) is uninteresting. It may therefore be replaced by T, 
something which is in fact achieved by the roundtrip described in Sect. 7.1. 

To formalize a notion of reachability we introduce a set Reach p that is inten- 
ded to encompass® all entries of and pp that are “reachable” from the root of 
P. Let Analyzes^{pf .Xx.ucqO ^rne) be a shorthand for Cf(^O) we[/, x i— >■ m]) yf T 
and pp{x,m) yf T and {{{pf .Xx-ue^o ^me), Memp)} <v pp{f ,m). We define 
Reach p as the least set satisfying: 

[prg] {P,[]) G Reach p 

[fun] .X'‘x.ueo’'0 , me) G Reach p A Analyzes^{pf.Xx.ueo'‘0,me))'^ 

{(ueo^o, me[/, X I— >■ mj), (x,m), (f,m)} C Reach p 

[app] (ei@ie 2 ,me) G Reachp {(ei,me), (e 2 ,me)} C Reachp 

^ It is tempting to write “^f((Z, me), aco)” in Fig. 3 (thus replacing h by 1), but then 
subject reduction for flows would not hold. 

® Unlike [JWW97], we do not compare analyses with different sets of mementoes. 

® This is somewhat similar to the reachability predicate of [GNN97]. 
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[sue] (succ* ei,me) € Reachp (ei,me) € Reachp 

[if] (ifO* Co then ei else 62, me) € Reachp 
{(eo,me), (ei,me), (c 2 , me)} C _Reac/ip 

Example 6. It is easy to verify that for ue^ G SubExpr. ^ it holds that C. (, l)me yf 
_L iff {ue\ me) G Reach, j , and that p. ^ {z, m) _L iff {z, m) G Reach, j . 

Lemma 2. Let F be a flow analysis for P such that F is valid. If {ue\me) G 
Reachp then (i) CpiUme) yf _L and (ii) whenever {z m) G me then (z,m) G 
Reachp holds. Also, if (z,m) G Reachp then pF{z,m) yf _L. 

5 Translating Types to Flows 

Let a uniform typing T for a program P be given. We now demonstrate how to 
construct a corresponding flow analysis F = T{T) such that F is valid and safe. 
First define Memp as ITp] note that then an address can serve as a memento 
environment. Next we define a function Tp that translates from UTypp, that is 
the union types that can be built using ITp and UTp, into FlowSetp' 

^T(Vie/{9» : ti}) = 

{{(pf.Xx.e, me), M) \3i G I with M = dom(ti): 

a judgement for pf.X^x.e occurs in Dp with address me 
and is justified by where t </^ ti} 

U (if 3i. such that qi = q-mt then {Int} else 0) 

The idea behind the translation is that Pp(u) should contain all the closures 
that are “sources” of elementary types in u] it is easy to trace such closures 
thanks to the presence of U-tags. The condition t </\ ti is needed as a “sanity 
check” , quite similar to the “trimming” performed in [Hei95] , to guard against 
the possibility that two unrelated entities in Dp incidentally have used the same 
U-tag qi. As the types of P&P do not contain fun-witnesses, their translation 
has to rely solely on this sanity check (at the cost of precision, cf. Sect. 1). 

Lemma 3. The function Tp is monotone. 

Definition 3. With T a typing for P, the flow analysis F = P(T) is given by 
{P, ITp,Cp, pp,<Pp), where Cp, pp, and<Pp are defined below: 

Cp{l,me) = Pp{u) iff Dp contains a judgement A h ue^ : u with address me 
pp{z,m)=Pp{u) iff u = Ap{z,m) 

T>p{{l2,me), {pf.Xx.eo,me')) = M iff there exists q such that Dp contains 
a judgement for pf.Xx.eo at me' derived by 

a judgement for Ci @ at me derived by [app]*" where w®{q) = M . 



Example 1. With terminology as in Examples 2 and 3, it is easy to check that 
I (■*^x) = {^xl that P. (ux) = {ux}, and that =P{T j^). 
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We have the following result, where the proof that F is valid is by coinduction. 

Theorem 1. With T a uniform typing for P, for F = F(T) it holds that 
~ F is valid and safe 

— {ue‘,me) € Reachp iff Cpiljrne) yf _L (for ueG SubExprp) 

— (z,m) € Reachp iff pp{z,m) yf _L 

6 Translating Flows to Types 

Let a flow analysis F for a program P be given, and assume that F is valid 
and safe. We now demonstrate how to construct a corresponding uniform typing 
T = T[F). First we deflne ITt as Memp and UTp as AhsClosp U {®nt}- Next 
we deflne a function Tp that translates from FlowSetp into UTypp] inspired by 
P&P (though the setting is somewhat different) we stipulate: 

Tf(P) = where 

if u = Int then qy = q-mt and ty = int 
if u = (ac, M) with ac = {p.f .Xx-e^o ^ rne) 
then qy = ac 

and ty = : Tp{pp{x,m)) 7>(Cf(/o, we[/, a: m]))} 

where Mq = {m G M \ Analyzes^{ac)} . 

The above definition determines a unique union type Fp(V), since recursion 
is “beneath a constructor” and since FlowSetp is finite (ensuring regularity). 

Example 8. With terminology as in Examples 2 and 3, it is easy to see — provided 
that is considered another name for aCx — first that T. = u'^., and then 
that 71 ^ ({vx}) = Mx since 71 ^ ({'CxD-^x can be found as 

A({1} : T:^ (p. ^ (x, 1)) ^ T:^ (C. ^ (7, mexi)), {2} : 77^ (p. ^ (x, 2)) ^ 77^ (C. ^ (7, me^A)) 
= ({Int}) ^ 77^ ({Int}), {2} : 77^ ({u(}) ^ 77^ ({n(})) 

= A«i} : Uint t Ujnt, {2} I ^ Ux). 

Note that without the M component in a flow value {ac, M), Wx would equal u' 
causing 71 ^l^x}) to be an infinite type (as in P&P). 

Lemma 4. The function Tp is monotone. 

For z and m such that (z,m) € Reachp, we deflne Fp{z,m) as Fp{pp{z,m)) 
(by Lemma 2 this is well-defined). And for e = ue^ and me such that {e,me) G 
Reachp, we construct a judgement Tp{e,me) as 

Fp{me) h e : Tp{Cp{l,me)) 

where Tp{me) is defined recursively by T^([]) = [] and Tp{me[z !->■ m]) = 
Tp{me)[z !->■ Tf{z,m)\ (by Lemma 2 also this is well-defined). 
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Definition 4. With F a flow analysis for P, the typing T = T{F) is given by 
{P, Memp, AbsClosp ^ {qint}, Dt), where Dp is defined by stipulating that whe- 
never (e,me) is in Reach p then Dp contains Fp (e,me), and that Tp {e' ,me') is 
a premise of Tp {e,me) iff {e, me) G Reach p is among the immediate conditions 
(cf. the definition of Reach p) for {e',me') G Reach p. 

Example 9. It is easy to check that T = T(Fj^), modulo renaming of the U-tags. 

Clearly Dp is a tree-formed derivation, and Tp (e, me) has address me in Dp- 
We must of course also prove that all judgements in Dp are in fact derivable 
from their premises using the inference rules in Fig. 1: 

Theorem 2. If F is valid and safe then T = T{F) as constructed by Definition 4 
is a uniform typing for P. The derivation Dp has the following properties: 

— if Dt contains at address me a judgement for /i/.Ax.e, it is derived using 
[fun]*" where = {ac : {Tf{{{clc^ Mem F)}))-ac) with ac= {p,f.Xx.e,me); 

— if Dt contains at address me a judgement for ei @t 6 e 2*2 yjith the leftmost 
premise of the form A \- ei : Ui, then it is derived using [app]“ where for 
all q G dom{ui) it holds that w®{q) = 'PpHh: 'me),q). 

7 Round Trips 

Next consider the “round-trip” translations ToT (from flows to types and back) 
and ToT (from types to flows and back). Both roundtrips are idempotent^: they 
act as the identity on “canonical” elements, and otherwise “canonicalize” . 

Example 10. Exs. 7 and 9 show that T oT the identity on and that ToT 
is the identity (modulo renaming of U-tags) on . In particular ToT does not 
necessarily introduce infinite types, thus solving an open problem in P&P. 



7.1 Round Trips from the Flow World 

ToT Alters out everything not reachable, and acts as the identity ever after. 

Theorem 3. Assume that F is valid and safe for a program P, and let F' = 
T{T{F)). Then F' is valid and safe for P with Memp' = Memp, Reachp = 
Reachp, and Cpi{l,me) E iff {Cpfl^me) yf _L and (ue^me) G Reachp in 
which case Cp'{l,me) = &lterp{CF{l,me)) where fllterp(U) is given by 
{(ac, M') I (ac, M) G V and (/i/. Ax. eg, meg) G Reachp where 

ac= (fif.Xx.eo, meg) and M' = {m G M \ (eg, meg[/, x i— >■ m]) G Reachp} 
Ll{if Int G V then {Int} else 0). 

Finally, if F" = T{T{F')) then F" = F' . 

^ However, T{T{T(F))) = T{F) does in general not hold. 
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Clearly everything not reachable may be considered “junk”. However, simple 
examples demonstrate that some junk is reachable and is hence not removed by 
ToT ■ That our flow/type correspondence can faithfully encode such imprecisions 
illustrates the power of our framework. 

7.2 Round Trips from the Type World 

The canonical typings are the ones that are strongly consistent: 

Definition 5. A typing T is strongly consistent iff for all u that occur in Dt 
and for all q € dom{u) with gi„t the following holds: Dt contains exactly one 
judgement derived by an application of [fun]“ with taking the form {q : t), 
and this t satisfies t <f u.q. Here <f is a subrelation of</^, defined by stipulating 
that int int and that : Ui K} A - 

Theorem 4. Assume that T is a uniform typing for a program P, and let T' = 
T{T{T)). Then T' is a uniform typing for P with ITt' =ITt, and 

— Dt' contains a judgement for e with address ke iff Dt contains a judgement 
for e with address ke (i.e., the two derivations have the same shape); 

— Dt' is strongly consistent; 

— if Dt is strongly consistent then Dt> = Dt (modulo renaming of U-tags). 

Example 11. Let T be the typing® of the motivating example put forward in 
Sect. 1. Then T is not strongly consistent, but T' = T{T{T)) is: the two fun- 
witnesses occurring in Dt' are of the form (^x : Uint — >■ Mint) and {qy : Ui„t — >■ Mint)- 
Nevertheless, T' is still imprecise: both function abstractions are assigned the 
union type V(<?x : Mint Mint,<?y : Mint -f Mint)- 



8 Discussion 

Our flow system follows the lines of N&N, generalizing some features while omit- 
ting others (such as polymorphic splitting [WJ98], left for future work). That 
it has substantial descriptive power is indicated by the fact that it encompasses 
both argument-based and call-string based polyvariance. In particular, the flow 
analysis framework of P&P can be encoded into our framework. Unlike P&P, our 
flow logic has a subject reduction property, inherited from the N&N approach. 

The generality of our type system is less clear. The annotation with tags gives 
rise to intersection and union types that are not associative, commutative, or 
idempotent (ACI). This stands in contrast to the ACI types of P&P, but is similar 
to the non- ACI intersection and union types of CIL, the intermediate language of 
an experimental compiler that integrates flow information into the type system 
[WDMT97,DMTW97]. Indeed, a key motivation of this work was to formalize 
the encoding of various flow analyses in the CIL type system. Developing a 
translation between the the type system of this paper and CIL is our next goal. 

® We convert it to our framework by substituting Mint for int and by substituting 
V)?* : A({*} : Mint ->■ Mint)) for int ->■ int. 
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Abstract. JavaSpaces and TSpaces are two coordination middlewares 
for distributed Java programming recently proposed by Sun and IBM, 
respectively. They are both inspired by the Linda coordination model: 
processes interact via the emission (out), consumption (in) and the test 
for absence (inp) of data inside a shared repository. The most interesting 
improvement introduced by these new products is the event notifieation 
mechanism (notify): a process can register interest in the incoming ar- 
rivals of a particular kind of data, and then receive communication of 
the occurrence of these events. We investigate the expressiveness of this 
new coordination mechanism and we prove that even if event notifica- 
tion strictly increases the expressiveness of a language with only input 
and output, the obtained language is still strictly less expressive than a 
language containing also the test for absence. 



1 Introduction 

In the last decades we assisted to a dramatic evolution of computing systems, 
leading from stand-alone mainframes to a worldwide network connecting smal- 
ler, yet much more powerful processors. The next expected step in this direction 
is represented by the so-called ubiquitous computing, based on the idea of dyna- 
mically reconfigurable federations composed of users and resources required by 
those users. For instance, the Jini architecture [19] represents a first proposal of 
Sun for a Java-based technology inspired by this new computing paradigm. 

In this scenario, one of the most challenging topics is concerned with the 
coordination of the federated components. For this reason, a renewed interest 
in coordination languages - that have been around for more than fifteen years 
- has arisen. For example, JavaSpaces [18] and TSpaces [20] are two recent 
coordination middlewares for distributed Java programming proposed by Sun 
and IBM, respectively. These proposals incorporate the main features of both 
the two historical groups of coordination models [13]: the data-driven approach, 
initiated by Linda [8] and based on the notion of a shared data repository, 
and the control-driven model, advocated by Manifold [1] and centered around 
the concepts of raising and reaction to events. Besides the typical Linda-like 

* Work partially supported by Esprit working group n. 24512 “Coordina” 
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coordination primitives (processes interact via the introduction, consumption 
and test for presence/absence of data inside a shared repository) both JavaSpaces 
and TSpaces provide event registration and notification. This mechanism allows 
a process to register interest in the future arrivals of a particular kind of data, 
and then receive communication of the occurrence of these events. 

In this paper we investigate the interplay of the event notification mechanism 
with the classical Linda-like coordination paradigm. In particular we focus on the 
expressive power of event notification and we prove the existence of a hierarchy 
of expressiveness among the possible combinations of coordination primitives: 
in, out, and inp are strictly more expressive than in, out, and notify, which in 
turn are strictly more expressive than in and out only. 

These results are proved by introducing a minimal language containing all the 
coordination mechanisms we are dealing with, and by considering the sublangu- 
ages corresponding to the various combinations of the coordination primitives. 
The complete language (denoted by T-‘ntf,inp) is obtained by extending a Linda 
based process algebra presented in [2] with the event notification mechanism. 
We consider the following sublanguages: L containing only in and out, L„t/ con- 
taining also notify, and Li„j, containing in, out and inp. 



L 



ntf , inp 




sublanguage 



I encoding 

no encoding 



Fig. 1. Overview of the results. 

The hierarchy of expressiveness sketched above follows from the three results 
summarized in Figure 1. 

The expressiveness gap between L„t/ and L can be deduced by the following 
facts: 

(1) There exists an encoding of L on finite Place/Transition nets [14,16] which 
preserves the interleaving semantics. As the existence of a terminating com- 
putation is decidable in P/T nets [6], the same holds also in L. 

(2) There exists a nondeterministic implementation of Random Access Machines 
(RAM) [17], a well known Turing powerful formalism, in L„tj. The imple- 
mentation preserves the terminating behaviour: a RAM terminates if and 
only if the corresponding implementation has a terminating computation. 
Thus, the existence of a terminating computation is not decidable in L„t/. 

Hence there exists no encoding of L„t/ in L which preserves at least the existence 
of a terminating computation. 
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The discrimination between Li„p and L„t/ proceeds in a similar way: 

(3) There exists an encoding of L„t/ on finite Place/Transition nets extended 
with transfer arcs [7] which preserves the existence of an infinite computa- 
tion. As this property is decidable in this kind of P/T nets, the same holds 
also in L„tj. 

(4) There exists a deterministic implementation of RAM in L^„p such that a 
RAAI terminates if and only if all the computation of the corresponding 
implementation terminate. Thus, the existence of an infinite computation is 
not decidable in Li„p. 

Hence there exists no encoding of limp in which preserves at least the 
existence of an infinite computation. 

Finally, the last result is: 

(5) The event notification mechanism can be realized by means of the inp ope- 
rator; indeed we provide an encoding of T-‘ntf,inp in Li„p (and hence also of 
Lnt/ in L^jip). 

The paper is organized as follows. Section 2 presents the syntax and semantics 
of the language. Section 3, 4, and 5 discuss respectively the discriminating results 
between L„t/ and L, between Li„p and L„ty, and the encoding of 'Lntf,inp in Li„p. 
Section 6 reports some conclusive remarks. 



2 The Syntax and the Operational Semantics 

Let Name be a denumerable set of message names, ranged over by a,b, . . .. The 
syntax is defined by the following grammar: 

P ::=(«) \ C \ P\P 
C::=0 I pi.C I inp{a)lC.C \ C\C 
where: 

/X ::= in{a) \ out{a) \ notify{a, C) \ \in{a) 

Agents, ranged over by P, Q, . . ., consist of the parallel composition of the data 
already in the dataspace (each one denoted by one agent (a)) and the concurrent 
programs denoted by C, D, . . ., that share these data. A program can be a 
terminated program 0 (which is usually omitted for the sake of simplicity), a 
prefix form /i.P, an if-then-else form inp{a)lP-Q, or the parallel composition 
of programs. 

A prefix p, can be one of the primitives in{a) or out{a), indicating the with- 
drawing or the emission of datum a respectively, and the notify {a, P) operation 
that registers interest in the incoming arrivals of new instances of datum a: every 
time a new instance of (o) is produced, a new copy of process P is spawned. We 
also consider the bang operator !in(a) which is a form of replication guarded on 
input operations: the term !in(a).P is always ready to consume an instance of 
(a) and then activate a copy of P. The if-then-else form is used to model the 
inp primitive: inp{a)lP - Q is a program which requires an instance of (o) to be 
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Table 1. Operational semantics (symmetric rules omitted). 



( 1 ) 

( 3 ) 

( 5 ) 

( 7 ) 

(9) 

( 11 ) 

( 13 ) 



{ a )^0 
in{a).P P 
on{a).P — ^ P|on(o).P 
inp{a)7P_Q P 

P ^ P' Q Q 



p\ 


Q — 


> P' 


\Q' 




P_i 


> p' 


Q- 


d 


Q' 


P\ 


0- 


> P' 


IQ' 




P_i 


> p' 


Q- 


d ^ 


Q' 


P\ 


Q — 


> P' 


IQ' 





(2) 

( 4 ) 

(6) 

(8) 

(10) 

(12) 

(14) 



P ^ P' 

(15) 

P\Q^ P'\Q 



out{a).P — ^ {0')\P 
notify{a, Q).P on(a).Q\P 
Hn(a).P — ^ P|!in(a).P 
inp(a)?P_Q ^ Q 
P ^ P' Q 4^ 

P\Q ^ P'\Q 
P -4 P' q- 4 
P\Q^P'\Q 
p ^ p' q-4 

p\Q ^ P'\Q 
(X ^ “in, a, a 



consumed; if it is present, the program P is executed, otherwise Q is chosen. In 
the following. Agent denotes the set containing all possible agents. 

The semantics of the language is described via a labeled transition system 
{Agent, Label, — >■) where Label = {r} U {o, a, -■a, a, o | a G Name} (ranged 
over by a, f3, . . .) is the set of the possible labels. The labeled transition relation 
— >■ is the smallest one satisfying the axioms and rules in Table 1. For the sake 
of simplicity we have omitted the symmetric rules of (9) — (15). 

Axiom (1) indicates that (o) is able to give its contents to the environment 
by performing an action labeled with a. Axiom (2) describes the output: in one 
step a new datum is produced and the corresponding continuation is activated. 
The production of this new instance of (o) is communicated to the environment 
by decorating this action with the label a. Axiom (3) associates to the action 
performed by the prefix in{a) the label a, which is the complementary of a. 

Axiom (4) indicates that notify{a, P) produces a new kind of agent on{a).P 
(that we add to the syntax as an auxiliary term). This process spawns a new 
instance of P every time a new (a) is produced. This behaviour is described in 
axiom (5) where the label a is used to describe this kind of computation step. 
The term \in{a).P is able to activate a new copy of P by performing an action 
labeled with a that requires an instance of (o) to be consumed (axiom (6)). 

Axioms (7) and (8) describe the semantics of inp{a)l P if the required 
(a) is present it can be consumed (axiom (7)), otherwise its absence is guessed 
by performing an action labeled with -<a (axiom (8)). Rule (9) is the usual 
synchronization rule. 
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Rules (10) — (14) regard the way actions labeled with the non-standard labels 
-'a, a, and a are inferred to structured terms. Rule (10) indicates that actions 
labeled with -<a can be performed only if no (o) is present in the environment 
(i.e. no transition labelled with a can be performed). Rules (11) and (12) consider 
actions labelled with a indicating the interest in the incoming instances of (o). 
If one process able to perform this kind of action is composed in parallel with 
another one registered for the same event their local actions are combined in a 
global one (rule (11)); otherwise, the process performs its own action leaving the 
environment unchanged (rule (12)). Rules (13) and (14) deal with two different 
cases regarding the label a indicating the arrival of a new instance of (o): if 
processes waiting for the notification of this event are present in the environment 
they are waked-up (rule (13)); otherwise, the environment is left unchanged (rule 
(14)). The last rule (15) is the standard local rule that can be applied only to 
actions different from the non-standard -<a, a, and a. 

Note that rules (10), (12), and (14) use negative premises; however, our 
operational semantics is well defined, because our transition system specification 
is strictly stratifiable [9], condition that ensures (as proved in [9]) the existence 
of a unique transition system agreeing with it. 

We define a structural congruence (denoted by =) as the minimal congruence 
relation satisfying the monoidal laws for the parallel composition operator: 

P = P|0 P\Q=Q\P P\{Q\R) = {P\Q)\R 

As two structural congruent agents are observationally indistinguishable, in the 
remainder of the paper we will reason up to structural congruence. 

In the following we will only consider computations consisting of reduction 
steps, i.e., the internal derivations that a stand-alone agent is able to perform 
independently of the context. In our language, we consider as reductions not only 
the usual derivations labeled with r, but also the non-standard labeled with -•a 
and a. In fact, derivation P p' indicates that P can become P' if no (a) is 

available in the external environment, and P P' describes that a new agent 
(a) has been produced. Hence, in any of these cases, if P is stand-alone (i.e. 
without external environment) it is able to become P' . Indeed, these labels have 
been used only for helping a SOS [15] formulation of the semantics, but they 
correspond conceptually to internal steps. Formally, we define reduction steps as 
follows: 

P — > P' iff P — ^ P' or P p' or P — ^ P' for some a 
We use P to state that there exists no P' such that P — P' . 

An agent P has a terminating computation (denoted by P ),) if it can block 
after a finite amount of internal steps: P — >■* P' with P' — /b . On the other 
hand, an agent P has an infinite computation (denoted by P f) if there exists 
an infinite computation starting from P: for each natural index i there exists Pi 
such that P = Pq and Pi — ^ Pi+i- Observe that due to the nondeterminism of 
our languages the two above conditions are not in general mutually exclusive, 
i.e., given a process P both P I and P f niay hold. 
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3 Comparing and L 

The discrimination between and L is a direct consequence of the facts (1) 
and (2) listed in the Introduction. 

The proof of (1) is a trivial adaptation of a result presented in [4]. Indeed, 
as we made in that paper, it is possible to define for L a Place/Transition net 
[14,16] semantics such that for each agent P the corresponding P/T net is finite 
and preserves the interleaving semantics; thus, an agent can terminate if and 
only if the corresponding net has a terminating computation. As this property 
can be decided in finite P/T nets [6], we can conclude that given a process P of 
L it is decidable if P J,. 

Result (2) uses Random Access Machines (RAM) [17] which is a Turing 
equivalent formalism. A RAM is composed of a finite set of registers, that can 
hold arbitrary large natural numbers, and by a program, that is a sequence of 
simple numbered instructions, like arithmetical operations (on the contents of 
registers) or conditional jumps. 

To perform a computation, the inputs are provided in registers ri, . . . , r™; 
if other registers rm,+i,...,r„ are used in the program, they are supposed to 
contain the value 0 at the beginning of the computation. The execution of the 
program begins with the first instruction and continues by executing the other 
instructions in sequence, unless a jump instruction is encountered. The execution 
stops when an instruction number higher than the length of the program is 
reached. If the program terminates, the result of the computation is the contents 
of the registers. 

In [12] it is shown that the following two instructions are sufficient to model 
every recursive function: 

— Succ{rj): adds 1 to the content of register rj; 

— DecJump{rj, s): if the content of register rj is not zero, then decreases it by 
1 and go to the next instruction, otherwise jumps to instruction s. 

We present an encoding of RAM based on the notify primitive. The encoding 
we present is nondeterministic as it introduces some extra infinite computations; 
nevertheless, it is ensured that a RAM terminates if and only if the corresponding 
encoding has a terminating computation. As termination cannot be decided in 
Turing equivalent formalisms, the same holds also for L„ty. A question remains 
open in this section: “Is it possible to define in hntf a more adequate deterministic 
implementation of RAM which preserves also the divergent behaviour?” . The 
answer is no, and it is motivated in Section 4 where we prove that the presence 
of an infinite computation can be decided in hntf- On the other hand, we will 
show in the same Section that a deterministic implementation of RAM can be 
defined in hmp- 

The encoding implements nondeterministically DecJump operations: two pos- 
sible behaviours can be chosen, the first is valid if the tested register is not zero, 
the second otherwise. If the wrong choice is made, the computation is ensured to 
be infinite; in this case, we cannot say anything about the corresponding RAM. 
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Nevertheless, if the computation terminates, it is ensured that it corresponds to 
the computation of the corresponding RAM. Conversely, any computation of the 
RAM is simulated by the computation of the corresponding encoding in which 
no wrong choice is performed. 



Table 2. Encoding RAM in L„t/. 



m =lIij\...\m\in{loop).DIV 

|i : Succ{rj)] = \in{pi).out{rj). notify [zeroj, INC). out{pi+\) 

|i : DecJump{rj,s)\ = \in{pi).out{loop). in{rj).in{loop). notify {zeroj , DEC). out{pi+i) 
\\in{pi) .out{zeroj) .in{zeroj) .out{ps) 



where: 




INC 


= out{loop) .in(match) .in{loop) 


DEC 


= out{match) 


DIV 


= out{div).\in{div).out{div) 



Given the RAM program R composed by the instructions R . . . R the cor- 
responding encoding is defined in Table 2. Observe that DIV is an agent that 
cannot terminate; we will prove that it is activated whenever a wrong choice is 
made. 

The basic idea of this encoding is to represent the actual content of each regi- 
ster rj with a corresponding number of (r^). Moreover, every time an increment 
(or a decrement) on the register rj is performed, a new agent on{zerOj).INC 
(or on{zerOj) .DEC) is spawned by using the notify operation. In this way it is 
possible to check if the actual content of a register rj is zero by verifying if the 
occurrences of on{zerOj).INC corresponds to the ones of on{zerOj).DEC . 

There are two possible wrong choices that can be performed during the com- 
putation: (i) a decrement on a register containing zero or (ii) a jump for zero on 
a non-empty register. 

In the case (i), out{loop).in{rj).in{loop).notify{zerOj,DEC).out{pi+i) is ac- 
tivated with no {rj) available. Thus, the program produces (loop) and blocks 
trying to execute in{rj). The produced (loop) will be not consumed and the 
agent DIV will be activated. 

In the case (ii), the process out{zerOj).in{zerOj).out{ps) is activated when 
there are more occurrences of the auxiliary agent on{zerOj).INC than the ones 
of on{zerOj).DEC . When {zeroj) is emitted, its production is notified to the 
auxiliary agents; then the corresponding processes INC and DEC start. Each 
DEC emits an agent {match) while each INC produces a term {loop), and re- 
quires a {match) to be consumed before removing the emitted {loop). As there 
are more INC processes than DEC, one of the processes INC will block waiting 
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for an unavailable {match)] thus it will not consume its corresponding {loop). As 
before, DIV will be activated. 

The formal proof of correctness of the encoding requires a representation of 
the actual state of a RAM computation: we use {i, mci, deci , . . . , incmi decm), 
where i is the index of the next instruction to execute while for each register 
index I, inci (resp. deci) represents the number of increments (resp. decrements) 
that have been performed on the register r;. The actual content of r; corresponds 
to inci — deci- In order to deal with correct configurations only, we assume that 
the number of increments is greater or equal than the number of decrements. 
Given a RAM program R, we write 

((z, mci, deci , . . . , mc„, dec„), R) — > {{i' , inc{, dec [, . . . , inc'„, dec'„), R) 
to state that the computation moves from the first to the second configuration 
by performing the instruction of R; {{i, inci, deci, . . . ,incn, decn), R) — /> 
means that the program R has no instruction i, i.e., the computation is ter- 
minated. As RAM computations are deterministic, given a RAM program R 
and a configuration (z, inci, deci , . . . , incm, deCm), the corresponding computa- 
tion will either terminate (denoted by {{i,inci, deci, . . . ,inCm, deCm), R) i) or 
diverge (((z, zzzci, deci , . . . , inCm, deCm), R) t)- As RAM permits to model all the 
computable functions both the termination and the divergence of a computation 
are not decidable. 

According to this representation technique a configuration is modeled as 
follows: 

|(z, zzzci, deci , . . . , incn, dec„)] = 

(^)l n*=i...n(n*„c, on{zerOi).INC\ Odec, on{zerOi).DEC\ n*„c,-dec, (’j)) 
where riie/ Pi denotes the parallel composition of the indexed terms Pi. 

It is not difficult to prove the following lemma stating that the encoding 
is complete as each RAM computation can be simulated by the corresponding 
encoding. 

Theorem 1. Let R be a RAM program, if 

((z, zzzci, deci , . . . , incn, dec„), R) — > ((i', inc[, dec [, . . . , inc'„, dec),), R) 
then also 

|(z, inci, deci, ..., zzzc„, dec„)]||R] — )>* |(z', inc[, dec),. . . , inc),, dec(,)]||R] 

On the other hand the encoding is not sound as it introduces infinite compu- 
tations. Nevertheless, a weaker soundness for terminating computations holds. 

Theorem 2. Let R be a RAM program, if 
{{i,inci,deci,. . . ,inCn,deCn)\\{R\ — P 
then P = |(z', inc'i, dec), . . . , inc),, dec(,)]||R] such that 

((z, zzzci, deci , . . . , inCn, dec„), R) — >■* ((z', inc), dec ), . . . , inc'„, dec',,), R) 

Corollary 1. Let R be a RAM program, then 

((z, inci, deci, ..., inCn, deCn),R) i iff |(z, inci, deci, ..., inc^, dec„)]||R] i 
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4 Comparing and Lntf 

The discrimination between L^„p and Lnt/ is a direct consequence of the facts 
(3) and (4) listed in the Introduction. 

The result (4) has been already proved in [4]. In that paper an encoding of 
RAM in a language corresponding to Linp is presented. Also that encoding (that 
we do not report here due to the space limits) represents the content of register 
Tj by means of agents of kind (rj). In this way, a Dec Jump instruction testing the 
register r, can be simply implemented by means of an inp(rj) operation which 
either consumes an available (rj) or observes that the register is empty. In [4] 
we prove that a RAM program can perform a computation step if and only if 
its encoding can perform the corresponding step. 

In order to prove the result (3) we recall, using a notation convenient for 
our purposes, the definition of simple P/T nets extended with transfer arcs (see, 

e.g., [7]). 



Definition 1. Given a set S , we denote by A4fin{S) the set of the finite multi- 
sets on S and by Tp{S , S) the set of the partial functions defined on S. We use © 
to denote multiset union. A P/T net with transfer arcs is a triple N = (5, T, mo) 
where S is the set of places, T is the set of transitions (which are triples 
(c,p,f) G Aifin{S) X A4fin{S) X iFp{S,S) such that the domain of the partial 
function f has no intersection with c and p), and mo is a finite multiset of pla- 
ces. Finite multisets over the set S of places are called markings; mo is called 
initial marking. Given a marking m and a place s, m{s) denotes the number of 
occurrences of s inside m and we say that the place s contains m{s) tokens. A 
P/T net with transfer arcs is finite if both S and T are finite. 

f 

A transitions t = (c,p,f) is usually written in the form c> — > p and f is 
omitted when empty. The marking c is called the preset of t and represents the 
tokens to be consumed. The marking p is called the postset of t and represents 
the tokens to be produced. The partial function f denotes the transfer arcs of 
the transition which connect each place s in the domain of f to its image f{s). 
The meaning of f is the following: when the transition fires all the tokens inside 
a place s in the domain of f are transferred to the connected place f{s). 

A transition t = (c,p,f) is enabled at m if c C m. The execution of the 
transition produces the new marking m' such that m'{s) = m{s) — c(s) +p(s) + 
Z^s':/(s')=s */ s is not in the domain of f , m'(s) = Y^s'-.f(s')=s 

otherwise. This is written as m — ^ m' or simply m — > m' when the transition 
t is not relevant. We use a, a' to range over sequences of transitions; the empty 
sequence is denoted by s; let a = t\, . . . ,tn, we write m m' to mean the firing 
sequence m m' . The net N = (5, T, m/) has an infinite computation 

if it has a legal infinite firing sequence. 

The basic idea underlying the definition of an operational net semantics for 
a process algebra is to decompose a process P into a multiset of sequential 
components, which can be thought of as running in parallel. Each sequential 
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component has a corresponding place in the net, and will be represented by a 
token in that place. Reductions are represented by transitions which consume 
and produce multisets of tokens. 

In our particular case we deal with different kinds of sequential components: 
programs of the form /i.P or inp{a)lP-Q, agents (a), and terms on{a,P) re- 
presenting idle processes on{a).P. Besides these classes of components corre- 
sponding directly to terms of the language, we need to introduce a new kind of 
components arrived{a, P) used to model event notification. 



notify{a, P). Q 





dec{Q) 



(a) 



dec(P) 



Fig. 2. Modeling event notification. 



The way we represent input and output operations in our net semantics is 
standard. More interesting is the mechanism used to model event notification re- 
presented in Figure 2. Whenever a new token is introduced in the place (a), each 
token in a place on{a,P) is transferred to the corresponding place arrived{a, P). 
In order to realize this, we use a transfer arc that moves all the tokens inside 
the source place to the target one. Each token introduced in arrived{a, P) will 
be responsible for the activation of the new instance of P . Moreover, when the 
activation happens, also a token in on{a,P) is introduced in order to register 
interest in the next production of a token in (a). 

The main drawback of this procedure used to model event notification is that 
it is not executed atomically. For instance, a new token in (a) can be produced 
before it terminates. In this case, the processes whose corresponding token is still 
in the place arrived{a,P) will be not notified of the occurrence of this event. 
However, as we will prove in the following, even in the presence of this drawback 
the net semantics respects the existence of infinite computation. 
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After the informal description of the net semantics we introduce its formal de- 
finition. Given the agent P, we define the corresponding contextual P/T system 
Net{P). In order to do this, we need the following notations. 

— Let S be the set 

{P I P sequential program} U {(o) | a message name} U 
{on(o, P), arrived{a, P) \ a message name and P program}. 

— Let the function dec : Agent — >■ A4fin{S) be the decomposition of agents into 
markings, reported in Table 3. 

— Let T contain the transitions obtained as instances of the axiom schemata 
presented in Table 4. 

The axioms in Table 3, describing the decomposition of agents, state that the 
agent 0 generates no tokens; the decomposition of the terms (a) and of the other 
processes produces one token in the corresponding place; the decomposition of 
the idle process on{a).P generates one token in place on(a, P); and the parallel 
composition is interpreted as multiset union, i.e, the decomposition of P\Q is 
dec{P) © dec{Q). 

The axioms in Table 4 define the possible transitions. Axiom in(a,Q) deals 
with the execution of the primitives in{a): a token from place (o) is consumed. 
Axiom out(a,Q) describes how the emission of new datum is obtained: a new 
token in the place (a) is introduced and the transfer arcs move all the tokens 
from the places on(a, R) in the corresponding arrived{a, R). In this way, all the 
idle agents are notified. The activation of the corresponding processes R requires 
a further step described by the axiom arrived(a, Q) : an instance of process Q 
is activated (by introducing tokens in the corresponding places) and a token is 
reintroduced in the place on(a, Q) in order to register interest in the next token 
produced in (a). Axiom !in(a,Q) deals with the bang operator: if a token is 
present in place \in{a).Q and a token can be consumed from place (o), then a 
new copy of dec{Q) is produced and a token is reintroduced in \in{a).Q. Finally, 
axiom notify(a,Q,R) produces a token in the place on(a, Q) in order to register 
interest in the arrival of the future incoming token in (a). 

Definition 2. Let P he an agent. We define the triple Net{P) = {S,T,mo) 
where: 

S = {Q \ Q sequential subprogram of P} U 
{(o) I a message name in P} U 

{on{a, Q), arrived{a, Q) \ a message name in P and Q subprogam of P} 
/Is / 

T = {c > — >■ p I c > — ^ p G T and dom{c) C 6”} 
mo = dec{P) 

where by f\s we mean the restriction of function f to its subdomain S. 

It is not difficult to see that Net{P) is well defined, in the sense it is a correct 
P/T net with transfer arcs; moreover, it is finite. Moreover the net semantics is 
complete as it simulates all the possible computations allowed by the operational 
semantics. 
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Table 3. Decomposition function. 



dec(O) = 0 dec{{a)) = {(a)} 

dec(ii.P) = {fi.P} dec{on{a).P) = {on{a, P)} 

dec{P\Q) = dec{P) © dec{Q) 



Table 4. Transition specification. 



in(a,Q) 
out ( a , Q ) 

arrivedCa, Q) 

! in(a, Q) 
notify (a, Q ,R) 



in{a).Q © (a) > — > dec{Q) 
out{a).Q ((a)) © dec{Q) 

where / = {(on(a, R), arrived{a, R)) | 7? is a program} 
arrived{a, Q) > — > dec{Q) © on(a, Q) 

\in{a).Q © (a) > — >■ \in{a).Q © dec{Q) 
notify{a, Q).R > — > on(a, Q) © dec{R) 



Theorem 3. Let Net{P) = {S, T, mo) and R be an agent s.t. dom{dec{R)) C S . 
If R — R' then there exists a transition sequence a s.t. dec{R) dec{R'). 

The above theorem proves the completeness of the net semantics which, on the 
other hand, is not sound. Indeed, as we have already discussed, the encoding 
introduces some slightly different computations due to the non atomicity of the 
way we model the event notification mechanism. However, the introduction of 
these computations does not alterate the possibility to have an infinite compu- 
tation. This is proved by the following Theorem. 

Theorem 4. Let Net{P) = {S, T,mo) and R an agent s.t. dom{dec{R)) C S. 
There exists an infinite firing sequence starting from dec{R) iff R f- 

5 Comparing l^ntf,inp and 

In Section 3 we proved that in and out are not sufficiently powerful to encode 
the event notification mechanism; now we show that the addition of the inp 
operation permits to realize the encoding of hntf.inp in Lmp- 

In order to simulate event notification we force each process performing a 
notify{a, P) to declare its interest in the incoming (o) by emitting (waita). Then, 
the process remains idle, waiting for {arriveda) , signaling that an instance of (o) 
appeared. When an output operation out{a) is performed, a protocol composed 
of three phases is started. 
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Table 5. Encoding the notify primitive (n{P) denotes the set of message names of P). 






[01 = 0 
[(«)] = («) 

|Im(a).P] = m(a).|P] 
|!m(a).P] =!m(a).[P] 



lout{a).P] = in{mea).out{wCap)\0{a, P) 
linp{a)?P_Qj = mp(a)?|P]_ [Q] 

|on(ffl).P| = {waita)\ W {a, P)\\in{wap)-W{a, P) 

m01 = [-Pll[Q] 



lnotify{a, P).Q] = in{mea) -Out{waita) - out{wap) ■out{mea) .{\in{wap) ■ lE(a, P)|[<3]) 



= riae^(™e“) 

lT(o,P) = in{arriveda).out{waita).out{acka).out{wap)-\P\ 

0{a, P) = ]in{wCap)-inp{waita)?{out{creatinga)-out{wCap))- {out{a).out{caap))\ 

]in{ cttap ) ■ inp ( creatinga )?{out{ arriveda ).out{askacka).out{ coap ))- out { eoap ) | 
]in{eaap) .inp{askacka)7 {in(acka) .out{eaap)) - {out{mea) .[P]) 



In the first phase, each (waita) is replaced by {creatinga) ■ At the end of this 
phase (a) is produced. 

In the second phase, we start transforming each (creatinga) in the pair of 
agents (arriveda) and (askacka)- 

The agents (arriveda) will wake up the processes that were waiting for the 
notification of the addition of (a); each of these processes produces a new instance 
of (waita) (to be notified of the next emissions of (o)) and an (acka), to inform 
that it has been waked. We use two separated renaming phases (from waita to 
creatinga and then to arriveda) in order to avoid that a just waked process (that 
has emitted (waita) to be notified of the next occurrence of output of o) is waked 
two times. 

In the third phase the (acka) emitted by the waked processes are matched 
with the (askacka) emitted in the second phase; this ensures that all the processes 
waiting for emission of (a) have been waked. 

The concurrent execution of two or more output protocols could provoke 
undesired behaviour (for example, it may happen that some waiting process is 
notified of a single occurrence of output, instead of two); for this reason the 
output protocol is performed in mutual exclusion with other output protocols 
producing a datum with the same name. For similar reasons we avoid also the 
concurrent execution of the output protocol with a notification protocol con- 
cerning the same kind of datum. This is achieved by means of (mca), which is 
consumed at the beginning of the protocol and reproduced at the end. 

Note that, in the implementation of this protocol, the inp operator is neces- 
sary in order to apply a transformation to all the occurrences of a datum in the 
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dataspace. Indeed, with only a blocking input in it is not possible to solve this 
problem. The formal definition of the encoding is presented in Table 5. 

The proof of the correctness of the encoding is essentially based on an in- 
termediate mapping, where partially executed out and notify protocols are re- 
presented with an abstract notation. We report here only the enunciates of the 
main results. 

The following theorem states that each move performed by a process in 
can be mimicked by a sequence of moves of its encoding. 

Theorem 5. Let P he a term ofLntf inp s.t. n{P) C A. If P — ^ P' then 

m\MEAm^=i...k 0(a., P^) lPi\MEA\Y\^=i...h 0{h, Q^). 

The next result says that any computation of the encoding of P can be 
extended in order to reach the encoding of a process reachable from P. 

Theorem 6. Let P he a term ofhntf,inp s.t. n{P) C A. 

If |P]|Mi?^| ni=i k Pi) — Q then there exists P' such that P — >■* P' 
and Q lP'^\MEA\ll^=l...hO{h„Q,). 

6 Conclusion 

We investigated the expressiveness of event notification in a data-driven coor- 
dination model. We proved that the addition of the notify primitive strictly 
increases the expressiveness of a language with only in and out, but leaves it 
unchanged if the language contains also inp. On the other hand, we showed that 
the inp primitive cannot be encoded by in, out, and notify. 

We embedded the coordination primitives in a minimal language. The re- 
levance of our results extends to richer languages in the following way. The 
encodability result extends to any language comprising the minimal features of 
our calculus. The negative results of non-encodability can be interpreted on a 
Turing complete language as the necessity for an encoding to exploit the specific 
computational features of the considered language. 

We think that this kind of results has not only a theoretical relevance, but 
they could be of interest also for designers and implementors of coordination 
languages. For example, the powerful inp primitive has been a source of problems 
during the first distributed implementations of Linda (see, e.g., [10]). The results 
proved here suggest that the notify primitive may represent a good compromise 
between easiness of implementation and expressive power. 

In [3] we consider three different interpretations for the out operation and 
in [4] we found an expressiveness gap between two of them. More precisely, 
we proved that a language with in, out, and inp is Turing powerful under the 
ordered semantics (the one considered here), while it is not under the unordered 
one (where the emission and the effective introduction of data in the dataspace 
are two independent steps) . In [5] we investigate the impact of event notification 
on the unordered semantics: we prove that the addition of the notify primitive 
makes the language Turing powerful also under the unordered interpretation and 
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it permits a faithful encoding of the ordered semantics on top of the unordered 
one. 

Here, we have chosen the ordered interpretation as it is the semantics adopted 
by the actual JavaSpaces specifications, as indicated in the sections 2.3 and 2.8 
of [18], and also confirmed us by personal communications with John McClain 
of Sun Microsystems Inc. [11]. 
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Abstract. This paper presents a new closure conversion algorithm for 
simply-typed languages. We have have implemented the algorithm as 
part of MLton, a whole-program compiler for Standard ML (SML). 
MLton first applies all functors and eliminates polymorphism by code du- 
plication to produce a simply-typed program. MLton then performs clo- 
sure conversion to produce a first-order, simply-typed program. In con- 
trast to typical functional language implementations, MLton performs 
most optimizations on the first-order language, after closure conversion. 
There are two notable contributions of our work: 

1. The translation uses a general flow-analysis framework which inclu- 
des OCFA. The types in the target language fully capture the results 
of the analysis. MLton uses the analysis to insert coercions to trans- 
late between different representations of a closure to preserve type 
correctness of the target language program. 

2. The translation is practical. Experimental results over a range of 
benchmarks including large real-world programs such as the compiler 
itself and the ML-Kit [25] indicate that the compile-time cost of 
flow analysis and closure conversion is extremely small, and that the 
dispatches and coercions inserted by the algorithm are dynamically 
infrequent. 



1 Introduction 

This paper presents a new closure conversion algorithm for simply-typed langu- 
ages. We have implemented the algorithm as part of MLton^ , a whole-program 
compiler for Standard ML (SML). MLton first applies all functors and elimi- 
nates polymorphism by code duplication to produce a simply-typed program. 
MLton then performs closure conversion to produce a first-order, simply-typed 
program. Unlike typical functional language implementations, MLton performs 
most optimizations on the first-order language, after closure conversion. The 
most important benefit of this approach is that numerous optimization techni- 
ques developed for other first-order languages can be immediately applied. In 
addition, a simply-typed intermediate language simplifies the overall structure 

^ MLton is available under GPL from http://www.neci.nj .nec.com/PLS/MLton/. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 56-71, 2000. 
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of the compiler. Our experience with MLton indicates that simply-typed inter- 
mediate languages are sufficiently expressive to efficiently compile higher-order 
languages like Standard ML. 

An immediate question that arises in pursuing this strategy concerns the 
representation of closures. Closure conversion transforms a higher-order program 
into a first-order one by representing each procedure with a tag identifying the 
code to be executed (typically a code pointer) when the procedure is applied, 
and an environment containing the values of the procedure’s free variables. The 
code portion of a procedure is translated to take its environment as an extra 
argument. 

Like previous work on defunctionalization [19,3], the translation implements 
closures as elements of a datatype, and dispatches at call-sites to the appropriate 
code. We differ in that the datatypes in the target language express all proce- 
dures that may be called at the same call-site as determined by flow analysis. 
Consequently, the size of dispatches at calls is inversely related to the precision 
of the analysis. 

Using dispatches instead of code pointers to express function calls has two 
important benefits: (1) the target language can remain simply- typed without the 
need to introduce existential types [16], and (2) optimizations can use different 
calling conventions for different procedures applied at the same call-site. Howe- 
ver, if the simplicity and optimization opportunities afforded by using dispatches 
are masked by the overhead of the dispatch itself, this strategy would be inferior 
to one in which the code pointer is directly embedded within the closure record. 
We show that the cost of dispatches for the benchmarks we have measured is a 
small fraction of the benchmark’s overall execution time. We elaborate on these 
issues in Sections 4 and 6. 

Our approach extends the range of expressible flow analyses beyond that of 
previous work [26] by inserting coercions in the target program that preserve 
a closure’s meaning, but change its type. Using coercions, the translation ex- 
presses higher-order flow information in the first-order target language in a form 
verifiable by the type system. Since the results of flow analysis are completely 
expressed in the types of the target program, ordinary optimizations performed 
on the target automatically take advantage of flow information computed on 
the source. In Section 4, we show that representations can be chosen so that 
coercions have no runtime cost. 

Experimental results over a range of benchmarks including the compiler itself 
(approximately 47K lines of SML code) and the ML Kit (approximately 75K 
lines) indicate that the compile-time cost of flow analysis and closure conversion 
is small, and that local optimizations can eliminate almost all inserted coercions. 
Also, MLton often produces code significantly faster than the code produced by 
Standard ML of New Jersey [1]. 

The remainder of the paper is structured as follows. Section 2 describes the 
source and target languages for the closure converter. Section 3 defines the class 
of flow analyses that the translation can use. Section 4 presents the closure con- 
version algorithm. A detailed example illustrating the algorithm is given in Sec- 
tion 5. Section 6 describes MLton and presents experimental results. Sections 7 
presents related work and conclusions. 
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2 Source and Target Languages 

We illustrate our flow-directed closure conversion translation using the source 
language shown on the left-hand side of Figure 1. A program consists of a collec- 
tion of datatype declarations followed by an expression. As in ML, a datatype 
declaration defines a new sum type along with constructors to create and di- 
scriminate among values of that type. The source language is a lambda calculus 
core augmented by constructor application, case, tuple construction, selection 
of tuple components, and exceptions. Exceptions are treated as elements of a 
datatype. The source language is simply-typed, where types are either type con- 
structors, arrow types, or tuple types. We omit the type rules and assume that 
every expression and variable is annotated with its type. We write e : r to mean 
that e has type t. We write a; : r to mean that variable x has type r. We assume 
that all bound variables in a program are distinct. We use Exp, Bind, Lam, App, 
and Tuple to name the sets of specific occurrences of subterms of the forms e, 
b, fn X => e, y z, and ( . . . , x , ...), respectively, in the given program (occur- 
rences can be defined formally using paths or unique expression labels). TyCon 
names the set of datatypes declared in a program. 

Like the source language, the target language (see right-hand side of Figure 1) 
is simply-typed, but without arrow types, since the target language does not 
contain lambda expressions. A target language program is prefixed by a collection 
of mutually recursive first-order functions, and function application explicitly 
specifies the first-order function to be called. 



Source Language 

C € Con 

t e Tycon 

w, x,y,z G Var 

r ::= t 

I T -> T 

I ... * T * .. . 

P ..— let . . . data ... in e end 
data datatype t = . . . I C of r I . . . 
e ::= x 

I let X = b in e end 
b ::= e 

I fn ui => e 

I y z 

I Cy 

I case y of . . . \ C z => e \ ... 

I (..., y, ...) 

I *i y 

I raise y 

I ei handle y => ei 



Target Language 

/ G Func 

T ::= t 

I ... * T * .. . 

P ::= let . . . data ... in 

let . . . fun ... in e end 
end 

data datatype t = . . . I C of r I . . 
fun fun f (...., x, ...)=e 
e ::= x 

I let X = b in e end 
b ::= e 

I /(..., y, ...) 

I Cy 

I case y of . . . \ C z => e \ . . . 

I C..,y, ...) 

I y 

I raise y 
I 6i handle y => €2 



Fig. 1. Source and target languages. 
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V € Value = (Lam x Env) + Value* + (Con x Value) 
p £ Env = Var — >■ Value 



p, e^ v/p 



p,h ^ v/p 



p,x^ p(x) 

p,b ^ Vb p[x Vb], e ^ V / p 

p, let X = b in e end v/p 

P,b^P 

p, let a: = 6 in e end ^ p 



p, fn w => 


e ^ (fn w => e,p\py(f^ 


w — > e); 


p(y) = (fn 


w => e, p') p'[ui !->■ p(z 






p,y z^v 






P,C (C, (p(y))) 




p(y) 


= (C, v) p[z 1 -^ v],e^ 


v' 


p, case 


y of .. . \ C z => e \ ... 


^ v' 


p{y) 


= (C, v) p[z v],e^ 


p 


p, case 


y of .. . 1 C z => e 1 . . . 


^ p 


p, ( . . . 


, y, ^ , p(y). 


~ 




II 






p,#i y ^ Vi 






p, raise y ^ [p(y)] 






p, ei ^ V 




f 


), ei handle y => e 2 ^ v 




P, ei 


^ [ui] p[ye^vi],e 2 ^ 


V 2 


P 


, ei handle y => 62 ^ V 2 





Fig. 2. Source language semantics. 



We specify the source language semantics via the inductively defined relations 
in Figure 2. For example, expression evaluation defined via the relation written 
p,e v/p, is pronounced “in environment p, expression e evaluates either to 
value V or an exception packet p.” In this regard, the semantics of exceptions 
is similar to the presentation given in [15]. We write [v] to denote an exception 
packet containing the value v. A value is either a closure, a tuple of values, or a 
value built by application of a datatype constructor. The semantic rules for the 
target language are identical except for the rule for function application: 

[. . . Xi p(yi) . . .], e v/p 

...) v/p 



where fun / ( . . . , Xi , . . . ) = e is a function declaration in the program. 
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3 Flow Analysis 

Our flow analysis is a standard monovariant analysis that uses abstract values 
to approximate sets of exact values: 

aGAVal= TyCon+ V{Lam) + AVal* 

An abstract value may either be a Tycon, which represents all constructed values 
of that type, a set of A-occurrences, which represents a set of closures, or a 
sequence of abstract values, which represents a set of tuples. 

Definition 1. An abstract value a is consistent with a type r if and only if one 
of the following holds: 

1. a = t and t = t. 

2. a £ 'P{Lam), t = t\ -> T 2 , and for all f £ a, f : ti -> T 2 - 

3. a = (. . . , Gi, . . .), T = . . . * Ti * . . Qi is consistent with Ti for all i. 

We define our flow analysis as a type-respecting [12] function from variables, 
constructors, and exception packets to abstract values in the program. 

Definition 2. A flow is a function F : {Var + Con+ {packet}) — >• AVal such 
that 

1. For all X in P, if x : t then F{x) is consistent with r. 

2. For all C in P, if C carries values of type r then F{C) is consistent with r. 

Informally, F (x) conservatively approximates the set of values that x may take 
on at runtime. Similarly, F(C) over-approximates the set of values to which C 
may be applied at runtime. The special token packet models exception values; 
all exception values are collected into the abstract value T(packet). 

To formally specify the meaning of an analysis, we define a pair of relations by 
mutual induction. The first, between environments and flows (p C F), describes 
when an environment is approximated by the flow. 

pVFii for all x G dom{p), p{x) C/r F{x) 

The second relation, between values and abstract values {v a), describes 
when a value is approximated by an abstract value (relative to a flow). 

1. C u < if C is a constructor associated with datatype t, and v Cf F{C). 

2. (. . . , Vi, . ..) Vp , Gi, . . .) if Uj Vp Gi for all i. 

3. (fn x => e, p) a if fn X => e G a and p V F. 

Figure 3 defines a collection of safety constraints such that any flow meeting 
them will conservatively approximate the runtime behavior of the program. We 
use the following partial order on abstract values: 

Definition 3. a> g' if and only if 

— a = t = a' for some t £ TyCon, 

— a A a', where a, a' G V{Lam), or 

— a =(... , Qi, ...), a' = (... , a', . . .) and Ui > a' for all i. 
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Theorem 1. If F is safe and p Q F then 

— if p,e^ V then v F{last{e)). 

— if p,e^ [u] then v _F(packet). 

— if p,b ^ V and x = b € P then v □ij’ F{x). 

— if p,b ^ [u] then v Cf F(packet). 

Proof. By induction on p, e ^ v and p,b ^ v □ 



Definition 4. The last variable of an expression, which yields the expression’s 
value, 

is defined as follows: 

last{x) = X 

lost (let X = h -in e end) = last{e) 



Definition 5. A flow F is safe if and only if, for all x = b in P , 



1. 


if b 


is e, then F{x) = F{last{e)). 




2. 


if b 


is fn y => e, then F{x) > {fr 


1 y => e}. 


3. 


if b 


is y z, then for all fn w => e 


e F{y), 




a) 


F(w) > F{z), and 






b) 


F(x) > F{last{e)) 




1 


if b 


is C y, then F{C) > F{y). 




5. 


if b 


is X = case y of .. . \ Ci Zi 


=> Ci \ . . ., then for all i. 




a) 


F{zi) = F{Ci), and 






b) 


F{x) > F{last{ei)) 




6. 


if b 


is {..., yi , . . . ) , then F{x) 


= (..., F(yi), ...). 


7. 


if b 


is #i y and F{y) = (..., Oi, 


. . .) then F{x) = a:. 


8. 


ifb 


is raise y then F(packet) > . 


Fy. 


9. 


if b 


is Cl handle 2 => 62 then F{: 


^) > F(packet), F{x) > F {last{ei)) , 




and 


Fix) > Filastie^)). 





Fig. 3. Safety constraints on flows. 



The constraints are standard for a monovariant control-flow analysis [9,17] 
with the following two exceptions. First, rule 4 merges all arguments to a con- 
structor. This is to avoid introducing recursive coercions, and to reduce the 
number of coercions performed at runtime. Second, we use “=” instead of “>” 
in some flow constraints to simplify the specification of the translation, although 
it is straightforward to incorporate the extra generality in practice. One can also 
prove that for any program, there is a minimum safe flow; this corresponds to the 
usual OCFA. Another example of a safe flow is the unification-based flow analysis 
described by Henglein [11] and used by Tolmach and Oliva [26]. We can view 
this analysis as adhering to the safety constraints in Figure 3 with containment 
(>) replaced by equality in the rules. 
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4 Closure Conversion 

Given a safe flow F for the following source program: 

let . . . (datatype t = . . . I C of r I in e end 

the closure conversion algorithm produces the following target program: 
let datatype t = . . . I C of T^F^C)) I ... 

datatype T{L) = . . . I C{L, in x => e) ot ( . . . * T{F{yi)) *...)! ... 
in let . . . 

fun Af{in x => e) (r , x) = let ... yi = #i r ... in |e] end 

in |el 

end 

end 

The translation inserts one datatype declaration for each set L that appears in 
the range of F , with one constructor for each A-expression in L. We write T(T) 
to denote the new datatype for L and C{L, fn x => e) to denote the name of the 
constructor corresponding to fn a; => e G L. The constructor’s argument has the 
type of the tuple of free variables of fn x => e, that is (. . . , j/i, . . .). We extend T 
to abstract values by deflning T{t) = t and T((. . . , a^, ...)) = ... * T(ai) * . . . 

The translation also creates one function declaration for each A-expression 
that occurs in the source program. The name of the target language first-order 
function for fn a; => e is denoted by Af(in x => e). Each function extracts all 
the free variables of the closure record passed as the first argument, and then 
continues with the translated body. 

The translation uses auxiliary functions |*] : Exp — >■ Exp and [•Jj, : Bind — >■ 
Bind, which appear in Figure 4. The interesting cases in the translation are for 
A-expressions and application. Rule 2b builds a closure record by applying the 
appropriate constructor to the tuple of the procedure’s free variables. Rule 2c 
translates an application to a dispatch on the closure record of the procedure 
being applied. Because the safety constraints only require containment instead 
of equality, the translation inserts coercions at program points where the flow 
becomes less precise. 

The coercion function X , defined in Figure 4, changes the representation of a 
value from a more precise to a less precise type. For example, the translation of 
an application may require coercions at two points. First, if the abstract value 
of the argument is more precise than the formal, a coercion is inserted to change 
the argument’s type to the formal’s. Second, a coercion is required if the abstract 
value of the result is more precise than the abstract value of variable to which 
it becomes bound. 
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1. a) [let X = b ±n e end] = let x = [bj^ in [e| end 
b) [a;] = X 

2. a) [e];r = [e] 

b) \fnw=>e\x = C{...,y, ...), 

where C = C{F{x), fn w => e) and FV(fn w => e) = . . . y .... 

c) ly z\x = case y of 

I C(F{y), fn w => e) r => let z' = X{z, F{z), F{w)) 

V = A/”(fn w => e) (r , t') 
v' = X{v, F{last{e)), F{x)) 
in v' 
end 

where there is one branch for each fn ui => e G F{y) and z' , v, and v' are 
fresh. 

d) lCyjx= let y' = X{y,F{y),F{C)) 

r = Cy' 

in r 
end 

where y' and r are fresh variables. 

e) [case y of . . . I C t => e I . . = 

case y of 

\ C z => let r = [e] 

r' = X{r, F{last{e)), F{x)) 
in r' 
end 

where r, r' are fresh variables. 

f) y, .. .)\x = y , . . . ) 

g) [#* y\x =*iy 

h) [raise y\x = raise y 

i) [ei handle t => ealx = let yi= [ei] 

2/2= X{yi,F{last{ei)), F{x)) 
in j/2 end 

handle t => let y^= C2 

2 / 4 = X{y2, F{last{e2)), F{last{x))) 
in 2/4 end 

Fig. 4. Closure conversion of expressions. 



4.1 Practical Issues 

Although for a simple type system we must express coercions as a case expres- 
sion with each arm simply changing the constructor (and the type) representing 
the closure, it is easy to pick an underlying representation for these datatypes 
so that no machine code actually has to be generated. In terms of the under- 
lying memory objects, all coercions are the identity. If these datatypes are all 
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We define X : Var x AVal x AVal — >■ Bind by cases on abstract values. (Note, 
X{x, a, a') is only defined when a < a' .) 

1. if a = a' then X{x, a, a') = x. 

2. X{x, ai, o', ...))= let ... 

Hi = #i x 

Vi = X{yi,ai,a'i) 

...) 

in z' 
end 

where z' , . . . ,yi,y'i, . . . are fresh variables. 

3. X{x, L, L') = case x of 

I C(L,fn X => e) r => C{L',±n x => e) r 
where there is one branch for each fn a; => e € L. 

Fig- 5. The coercion function. 



represented as a tag word (whose only function is to distinguish between the 
summands forming the datatype) followed by some fixed representation of the 
value being carried by that summand, then the only thing which might be chan- 
ged by the coercion function is the tag word. It is thus easy to pick the tags so 
that they also don’t change (for instance, use the address for the code of the 
procedure). However, we do not do this in MLton. As shown in Section 6, dy- 
namic counts indicate coercions are so rare that their cost is unimportant. The 
advantage of allowing the coercions to change representations is that one can 
choose specialized representations for environment records. 

The closure conversion algorithm is designed to be safe-for-space [1]. Note 
that each closure record is destructed at the beginning of each first order func- 
tion. The alternative of replacing each reference to a closed-over variable with a 
selection from the closure record violates space safety because it keeps the entire 
record alive. Another possible violation is rule 2c, which can turn a tail-call into 
a non-tail-call by requiring a coercion after the call. However, since each such 
coercion corresponds to a step up the lattice of abstract values which is of finite 
height, the space usage of the program can only increase by a constant factor. 

Finally, it is possible to share all of the dispatches generated for calls to a 
given set of A-expressions. However, MLton does not do this, since it has not 
been necessary for performance. 
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5 Example 

Consider the example in Figure 6. 




The source appears in part (a), the OCFA flow is in part (b), and the result 
of closure conversion appears in part (c). We use fn a to represent the entire A- 
expression beginning with fn a. Consider the translation of the last expression, 
the call to m. Since m may be bound to a procedure corresponding to fn b or 
fn d, the call must dispatch appropriately. For the expression which defines h, 
each branch of the case-expression must coerce a procedure corresponding to 
a known A-expression to one which is associated with an element of {fn a, fn 
c}. In the expression defining m, both a dispatch and a coercion occur: first a 
dispatch based on the A-expression which provides the code for the h is required. 
Then, each arm of this case expression must coerce the result (a function with 
known code) to one associated with either fn b or fn d. 
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6 Experiments 

We have have implemented the algorithm as part of MLton, a whole-program 
compiler for Standard ML. MLton does not support separate compilation, and 
takes advantage of whole program information in order to perform many opti- 
mizations. Here, we give a brief overview of the relevant compiler passes and 
intermediate languages. First, MLton translates the input SML program into 
an explicitly- typed, polymorphic intermediate language (XML) [8]. XML does 
not have any module level constructs. All functor applications are performed at 
compile-time [6] , and all uses of structures and signatures are eliminated by mo- 
ving declarations to the top-level and appropriately renaming variables. Next, 
MLton translates the XML to SXML (a simply- typed language) by monomor- 
phisation, eliminating all uses of polymorphism by duplicating each polymorphic 
expression for each monotype at which it is used. After monomorphisation, small 
higher-order functions are duplicated; a size metric is used to prevent excessive 
code growth. MLton then performs flow analysis as described in Section 3 on the 
resulting SXML, and closure converts procedures to FOL (a first-order simply- 
typed language) via the algorithm described in Section 4. After a series of opti- 
mizations (e.g., inlining, tuple flattening, redundant argument elimination, and 
loop invariant code motion), the FOL program is translated to a C program, 
which is then compiled by gcc. Like [22], a trampoline is used to satisfy tail- 
recursion. To reduce trampoline costs, multiple FOL procedures may reside in 
the same C procedure; a dispatch on C procedure entry jumps to the appropriate 
code [7]. 

To demonstrate the practicality of our approach, we have measured its im- 
pact on compile time and code size for benchmarks with sizes up to 75K lines. 
Among the benchmarks, knuth-bendix, life, lexgen, mlyacc, and simple are 
standard [1]; ratio-regions is integer intensive; tensor is floating-point inten- 
sive, and count-graphs is mostly symbolic^. MLton is the compiler itself, and 
kit is the ML-kit [25,24]. The benchmarks were executed on a 450 MHz Intel 
Xeon with 1 GB of memory. 

In Table 1, we give the number of lines of SML for each benchmark, along 
with compile times both under SML/NJ (version 110.9.1)^ and MLton. The 
number of lines does not include approximately 8000 lines of basis library code 
that MLton prefixes to each program. The compile time given for SML/NJ is the 
time to batch compile the entire program. In order to improve the performance 
of the code generated by SML/NJ, the entire program is wrapped in a local 
declaration whose body performs an exportFn. For MLton, we give the total 
compile time, the time taken by flow analysis and closure conversion, and the 
percentage of compile time spent by gcc to compile the C code. 

The flow analysis times are shorter than previous work [2,10,4] would suggest, 
for several reasons. First, the sets of abstract values are implemented using hash 

^ ratio-regions was written by Jeff Siskind (qobi@research.nj.nec.com), tensor 
was written by Juan Jose Garcia Ripoll (worm@arrakis.es), and count-graphs was 
written by Henry Cejtin (henry@clairv.com). 

® Except for the kit which is run under SML/NJ version 110.0.3 because 110.9.1 
incorrectly rejects the kit as being ill-typed. 




Flow-Directed Closure Conversion for Typed Languages 



67 



consing and the binary operations (in particular set union) are cached to avoid 
re-computation. Second, because of monomorphisation, running OCFA on SXML 
is equivalent to the polyvariant analysis given in [12]. Thus, it is more precise 
than OCFA performed directly on the (non-monomorphised) source alone, and 
hence fewer set operations are performed. Third, the analysis only tracks higher- 
order values. Finally, the analysis is less precise for datatypes than the usual 
birthplace[13] approach (see rules 4 and 5a in Figure 3). Also, unlike earlier 
attempts to demonstrate the feasibility of OCFA [20] which were limited to small 
programs or intramodule analysis, our benchmarks confirm that flow analysis is 
practical for programs even in excess of 50K lines. 

MLton compile-times are longer than SML/NJ. However, note that the ratio 
of MLton’s to SML/NJ’s compile-time does not increase as program size increa- 
ses. We believe MLton’s compile-time is in practice linear. In fact, gcc is a major 
component of MLton’s compile-time, especially on large programs. We expect a 
native back-end to remove much of this time. 

Table 2 gives various dynamic counts for these benchmarks to quantify the 
cost of closure conversion. To make the presentation tractable, the entries are 
in millions per second of the running time of the program. Nonzero entries less 
than .01 are written as ~0. SXML Known and Unknown measure the number of 
known and unknown procedure calls identified in the SXML program using only 
syntactic heuristics [1]. FOL Known indicates the number of known procedure 
calls remaining in the FOL program after flow analysis and all optimizations on 
the FOL program have been performed. The difference between SXML and FOL 
Known is due to inlining and code simplificaton. Dispatch indicates the number 
of case expressions introduced in the FOL program to express procedure calls 
where the flow set is not a singleton. Thus, the difference between Dispatch 
and Unknown gives a rough measure of the effectiveness of flow analysis above 
syntactic analyses in identifying the procedures applied at call-sites. Finally, 



Table 1. Program sizes (lines) and compile times (seconds). 



Program 


lines 

SML 


SML/NJ 


Total 


MI 

Flow 


Ton 

Convert 


gcc% 


count-graphs 


204 


1.2 


4.02 


.01 


.25 


38% 


kit 


73489 


1375.75 


2456.39 


1.34 


27.96 


82% 


knuth-bendix 


606 


2.7 


6.55 


.01 


.32 


47% 


lexgen 


1329 


4.5 


19.52 


.03 


.78 


53% 


life 


161 


.9 


3.2 


.01 


.16 


41% 


MLton 


47768 


637.5 


1672.0 


1.94 


33.84 


81% 


mlyacc 


7297 


30.1 


144.86 


.10 


2.34 


38% 


ratio-regions 


627 


2.2 


6.22 


.01 


.35 


34% 


simple 


935 


4.7 


34.11 


.04 


.87 


54% 


tensor 


2120 


9.7 


10.12 


.03 


.32 


30% 


tsp 


495 


.8 


3.56 


.01 


.22 


30% 
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Table 2. Dynamic counts (millions/second). 



Program 


S) 

Known 


<ML 

Unknown 


Known 


FOL 

Dispatch 


Coerce 


count-graphs 


60.2 


'0 


1.0 


0 


0 


kit 


13.1 


.11 


5.8 


.02 


'0 


knuth-bendix 


28.8 


'0 


11.3 


'0 


0 


lexgen 


63.4 


2.68 


15.4 


'0 


0 


life 


28.4 


0 


22.3 


0 


0 


MLton 


14.5 


.48 


5.2 


.34 


.01 


mlyacc 


37.5 


.03 


10.6 


'0 


0 


ratio-regions 


119.4 


0 


14.3 


0 


0 


simple 


34.2 


.26 


6.2 


.26 


0 


tensor 


140.6 


'0 


7.6 


'0 


0 


tsp 


34.5 


'0 


3.4 


'0 


0 



Coerce indicates the number of coercions performed on closure tags to ensure 
that the closure’s type adheres to the appropriate flow set. 

For most benchmarks, monomorphisation, and aggressive syntactic inlining 
make most calls known. However, for several of the benchmarks, there still remain 
a significant number of unknown calls. Flow analysis uniformly helps in reducing 
this number. Indeed, the number of dispatches caused by imprecision in the 
analysis is always less than 5% of the number of calls executed. Notice also that 
the number of coercions performed is zero for the majority of the benchmarks; 
this means imprecision in the flow analysis rarely results in unwanted merging 
of closures with different representations. 

Table 3 gives runtime results for both SML/NJ and MLton. Of course, be- 
cause the two systems have completely different compilation strategies, optimi- 
zers, backends, and runtime systems, these numbers do not isolate the perfor- 
mance of our closure conversion algorithm. However, they certainly demonstrate 
its feasibility. 



Table 3. Runtimes (in seconds) and ratio of SML/NJ to MLton. 



Program 


SML/NJ (sec) 


MLton (sec) 


NJ/MLton 


count-graphs 


28.8 


11.9 


2.40 


kit 


27.5 


30.9 


.89 


knuth-bendix 


44.1 


15.2 


2.90 


lexgen 


52.7 


31.8 


1.66 


life 


51.5 


54.2 


.95 


MLton 


198.7 


101.3 


1.96 


mlyacc 


43.4 


20.6 


2.11 


ratio-regions 


122.5 


18.9 


6.48 


simple 


25.3 


18.4 


1.38 


tensor 


154.4 


19.8 


7.78 


tsp 


191.7 


25.4 


7.54 
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7 Related Work and Conclusions 

Closure conversion algorithms for untyped target languages have been explored 
in detail [1,21]. Algorithms that use a typed target language, however, must solve 
the problem created when procedures of the same type differ in the number and 
types of their free variables. Since closure conversion exposes the types of these 
variables through an explicit environment record, procedures having the same 
source-level type may compile to closures of different types. Minamide et al. [16] 
address this problem by defining a new type system for the target language that 
uses an existential type to hide the environment component of a closure record 
in the closure’s type, exposing the environment only at calls. Unfortunately, the 
target language is more complex than the simply-typed A-calculus and makes 
it difficult to express control-flow information. For example, the type system 
prevents expressing optimizations that impose specialized calling conventions 
for different closures applied at a given call-site. 

An alternative to Minamide et al.’s solution was proposed by Bell et al. [3]. 
Their approach has the benefit of using a simply-typed target language, but 
does not express control-flow information in the target program. Inspired by a 
technique first described by Reynolds [19], they suggest representing closures 
as members of a datatype, with one datatype for each different arrow type in 
the source program. Tolmach and Oliva [26] extend Bell et al. by using a weak 
monovariant flow analysis based on type inference [11]. They refine the closure 
datatypes so that there is one datatype for each equivalence class of procedures 
as determined by unification. Although their approach does express flow ana- 
lysis in a simply-typed target language, it is restricted to flow analyses based 
on unification. We differ from these approaches by using datatype coercions to 
produce a simply-typed target program and in our use of OCFA. 

Dimock et al. [5] describe a flow-directed representation analysis that can 
be used to drive closure conversion optimizations. Flow information is encoded 
in the type system through the use of intersection and union types. Like our 
work, their system supports multiple closure representations in a strongly-typed 
context. However, they support only a limited number of representation choices, 
and rely critically on a more complex type system to express these choices. Our 
work also uses flow information to make closure representation decisions, but 
does so within a simply- typed A calculus. 

Palsberg and 0’Keefe[18] define a type system that accepts the same set 
of programs as OCFA viewed as safety analysis. Their type system is based 
on simple types, recursive types, and subtyping. Although they do not discuss 
closure conversion, our coercions correspond closely to their use of subtyping. 
By inserting coercions, we remove the need for subtyping in the target language, 
and can use a simpler language based on simple types, sum types, and recursive 
types. 

Our work is also related to other compiler efforts based on typed intermediate 
representations [23,14]. Besides helping to verify the implementation of compi- 
ler optimizations by detecting transformations that violate type safety, typed 
intermediate languages expose representations (through types) useful for code 
generation. For example, datatypes in the target language describe environment 
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representations as determined by flow analysis on the source language. Types 
therefore provide a useful bridge to communicate information across different 
compiler passes. 

The results of our flow-directed closure conversion translation in MLton de- 
monstrate the following: 

1. First-order simply- typed intermediate languages are an effective tool for 
compilation of languages like ML. 

2. The coercions and dispatches introduced by flow-directed closure conversion 
have negligible runtime cost. 

3. Contrary to folklore, OCFA can be implemented to have negligible compile- 
time cost, even for large programs. 
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Abstract. Directional types form a type system for logic programs 
which is based on the view of a predicate as a directional procedure 
which, when applied to a tuple of input terms, generates a tuple of output 
terms. It is known that directional-type checking wrt. arbitrary types is 
undecidable; several authors proved decidability of the problem wrt. di- 
scriminative regular types. In this paper, using techniques based on tree 
automata, we show that directional- type checking for logic programs wrt. 
general regular types is DEXPTiME-complete and fixed-parameter linear. 
The latter result shows that despite the exponential lower bound, the 
type system might be usable in practice. 

Keywords: types in logic programming, directional types, regular types, 
tree automata. 



1 Introduction 

It is commonly agreed that types are useful in programming languages. They help 
understanding programs, detecting errors or automatically performing various 
optimizations. Although most logic programming systems are untyped, a lot of 
research on types in logic programming has been done [33]. 

Regular types. Probably the most popular approach to types in logic pro- 
gramming uses regular types, which are sets of ground terms recognized by fi- 
nite tree automata (in several papers, including this one, this notion is extended 
to non-ground terms). Intuitively, regular sets are finitely representable sets of 
terms, just as in case of regular sets of words, which are finitely representable 
by finite word automata. 

Actually, almost all type systems occurring in the literature are based on 
some kinds of regular grammars which give a very natural (if not the only) way 
to effectively represent interesting infinite collections of terms that denote e.g. 
lists or other recursive data structures. Of course some of them use extensions 
of regular sets with non-regular domains like numbers (see the discussion in 
Section 3.3), nonground terms (where all types restricted to ground terms are 
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regular), polymorphism (where all monotypes, that is ground instances of po- 
lymorphic types, are regular). Very few systems go further beyond regular sets 
(and usually say nothing about computability issues for combining types). 

Prescriptive and descriptive approaches. There are two main streams 
in the research on types in logic programming. In the prescriptive stream the 
user has to provide type declarations for predicates; these declarations form an 
integral part of the program. The system then checks if the program is well- 
typed, that is, if the type declarations are consistent. The present paper falls 
into the prescriptive stream. 

In the descriptive stream the types are inferred by the system and used to 
describe semantic properties of untyped programs. The basic idea here is to over- 
approximate the least model of a given program by a regular set. This approach 
can be found in particular in [30,39,25,19,16,26,21,17], or, using a type-graph 
representation of regular sets (a type graph may be seen as a deterministic top- 
down tree automaton), in [29,28]. An advantage of the descriptive approach is 
that there is no need for type declarations; a disadvantage is that the inferred 
types may not correspond to the intent of the programmer. 

The approximation of the least model of the program by a regular set is 
often not as precise as one would expect. A typical example here is a clause 
append{[],L,L). Most of the systems approximate the success set of this clause 
by the set of triples {[],x,y) where x and y are any terms and thus loose the infor- 
mation that a second and third argument are of the same type. To overcome this 
problem, [24,20] introduced approximations based on magic-set transformation 
of the input program. It was observed in [12] that types of the magic-set trans- 
formation of a program coincide with directional types of the initial program as 
they appear in [35,8,4,2,1,3,6,5,7]. 

Directional types. Directional types form a type system for logic programs 
which is based on the view of a predicate as a directional procedure which, when 
applied to a tuple of input terms, generates a tuple of output terms. They first 
occurred in [35] as predicate profiles and in [8] as mode dependencies. Our use 
of the terminology “directional type” stems from [1]. 

Discriminative types. In most type systems for logic programs that are ba- 
sed on regular types, the types are restricted to be discriminative (equivalently, 
path-closed or tuple-distributive or recognizable by deterministic top-down tree 
automata) . The reason for that is probably a hope for better efficiency or concep- 
tual simplicity of such approach. Unfortunately, discriminative sets are closed 
under neither union nor complementation. A union of two discriminative sets 
is then approximated by a least discriminative set that contains both of them, 
but then the distributivity laws for union and intersection do not hold anymore. 
This is very unintuitive and has lead already to several wrong results. One of the 
results of this paper is that in the context of directional types the restriction to 
discriminative types, at least theoretically, does not pay: the exponential lower 
bound for the discriminative case is matched by the exponential upper bound 
for the general case. In fact, as shown in [14], even stronger restriction to unary 
types (where all paths in a tree are disconnected from each other, see e.g. [39]) 
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is not worth it, since type checking problem (even for non-directional types) 
remains hard for Dexptime. 

Complexity of type checking. The exponential lower bound of the type- 
checking problem looks quite discouraging at the first view. A closer look into the 
proof of the lower bound shows that the program used there was very short while 
the types were quite large (the encoding of the Turing-machine computation was 
done in the types, not in the program). The situation in practice looks usually 
quite opposite: the type-checking problems are usually applied to large programs 
and rather small types. This lead us to study the parameterized complexity of the 
problem, where the parameter is given by the length of the type to be checked. 
We obtained here quite a nice result: the problem is fixed-parameter linear, 
which means that for a fixed family of types the type-checking problem can be 
decided in time linear in the size of the program. This shows that there is a good 
potential for the type system to be practical. A similar phenomenon is known 
already for the functional language ML where bad theoretical lower bounds do 
not match good practical behaviour of the time system. The explanation was 
given by Henglein [27] who showed that typability by types of size bounded by 
constant is polynomial time decidable. 

Related work. It is pointed out in [1] that the type checking problem is unde- 
cidable for arbitrary directional types. Therefore Aiken and Lakshman restrict 
themselves to regular directional types. Although their algorithm for automatic 
type checking is sound for general regular types, it is sound and complete only for 
discriminative ones. It is based on solving negative set constraints and thus runs 
in nondeterministic exponential time. Another algorithm (without complexity 
analysis) for type-checking for discriminative directional types is given in [5]. 
In [12] it is proved that directional type checking wrt. discriminative types is 
DEXPTIME-complete and an algorithm for inferring (regular, not necessarily 
discriminative) directional types is given. 

Rychlikowski and Truderung [36] proposed recently a system of polymorphic 
directional types. The types there are incomparable with ours: on one hand 
they are more general because of the use of the polymorphism; on the other 
hand they are even more restricted than regular discriminative types (e.g. they 
are not able to express lists of an even length). The authors presented a type- 
checking algorithm working in Dexptime, but probably the most interesting 
feature of this system is the inference of so-called main type of a predicate — 
the type that provides a compact representation of all types of the predicate. 

Our results. The methods used in the mentioned papers are not strong 
enough to prove the decidability of directional type checking wrt. general re- 
gular types. In this paper, using tree-automata techniques, we prove that this 
problem is decidable in Dexptime, which, together with the result from [12] sta- 
ting DEXPTIME-hardness, establishes DEXPTIME-completeness of the problem. 
Moreover, we show that the problem is fixed-parameter linear - our procedure is 
exponential in the size of the input types, but linear in the size of the program. 
This improves the results by Aiken and Lakshman [1], Boye [5], and Charatonik 
and Podelski [12], where decidability is restricted to discriminative types. 
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Decidability of directional type checking wrt. general regular types has al- 
ready a quite long history. It was first proved [1 1] by a reduction to the encom- 
passment theory [9] . The result was not satisfactory because of the complexity of 
the obtained procedure: several (around five) times exponential. In [10] we found 
another solution based on a different kind of automata and reduced the comple- 
xity to NEXPTIME. The proof presented here is a refinement of the argument 
from [10]. 

2 Preliminaries 

If H is a signature (that is, set of function symbols) and Var is a set of variables 
then Ts is the set of ground terms and 7i;(. is the set of non-ground terms 
over S and Var. We write Var(t) for the set of variables occurring in the term t. 
The notation [S'] is used, depending on the context, for the cardinality of the set 
S or for the size of the object S (that is, the length of the word encoding S). 



2.1 Tree Automata 

The methods we use are based on tree-automata techniques. Standard techni- 
ques as well as all well-known results that we mention here can be found e.g. 
in [22,15,38]. Below we recall basic notions in this area. 

Definition 1 (Tree automaton). A tree automaton is a tuple A = 
{S, Q, A, F) where S, Q, A, F are finite sets such that 

— E is a signature, 

— Q is a finite set of states, 

— A is set of transitions of the form /(gi, . . . , g„) — >■ q where f € E, 
q,qi, . . . ,q„ € Q and n is the arity of f, 

— F C Q is a set of final states. 

The automaton A is called 

— bottom-up deterministic, if for all f G E and all sequences gi, . . . ,g„ G Q 
there exists at most one q € Q such that /(gi, . . . , g^) q € A, 

— top-down^ deterministic if |F| = 1 and for all f € E and all q € Q there 
exists at most one sequence gi, . . . , g^ G Q such that /(gi, . . . , g«) q € A, 

— complete, if for all f € E and all sequences gi,...,g„ € Q there exists at 
least one q & Q such that /(gi, . . . , g^) q € A. 

A tree automaton A = {E, Q, A, F) translates to a logic program containing a 
clause q{f{xi, . . . ,x„)) ^ gi(a:i), . . . ,g„(a;„) for each transition /(gi, . . . , g„) -)> 
q € A, where one is interested only in queries about the predicates in F. 



^ Intuitively, a top-down automaton reads trees top-down, and thus F is here the set 
of initial (not final) states. 
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Definition 2 (Run). A run of a tree automaton A = {S, Q, A, F) on a tree t £ 
Ts is a mapping p assigning a state to each occurrence of a subterm f(fi , . . . ,tn) 
oft such that 

f{p{ti), p{tn)) -£■ p{f{h, tn)) G Zi. 

A run p on t is successful if p{t) G F . 

Sometimes we will refer to runs over terms in Ts\jq. We then extend the 
definition above by putting p{q) = q for all states in Q. 

If there exists a successful run on a tree t then we say that the automaton 
accepts, or recognizes, t. The set of all trees accepted by an automaton A, 
denoted C{A), is called the language of the automaton A, or the set recognized 
by this automaton. A set of trees is called regular if it is recognized by some tree 
automaton. 

A state q of the automaton A is called [bottom-up] reachable if there exists 
a tree t G T^: and a run p of A on t such that p{t) = q. 

It is well-known (cf. [22,15,38]) that regular languages are closed under Boo- 
lean operations: one can effectively construct in polynomial time an automaton 
that recognizes union or intersection of given two regular languages, and in expo- 
nential time an automaton that recognizes complement. In polynomial time one 
can compute the set of reachable states and thus test emptiness of the language 
recognized by a given automaton. In exponential time one can determinize an 
automaton, that is, construct a bottom-up deterministic automaton that reco- 
gnizes the same set. Tree automata are not top-down determinizable. 

Example 1. Consider the automaton A = {{a, f},{qQ,qi,q\,{a — >■ qa,a — >■ 
qi, f{qa,qi) — >■ g}, {(?}). The run that assigns go to the first occurrence of a, 
gi to the second occurrence of a and g to /(a, a) is a successful run of A on 
f{a,a). The automaton A is top-down deterministic, is not bottom-up determi- 
nistic, and is not complete. 

2.2 Directional Types 

By logic programs we mean definite horn-clause programs (pure Prolog pro- 
grams). For the sake of simplicity we assume that all predicate symbols oc- 
curring in this paper are unary (there is no loss of generality since function 
symbols may be used to form tuples). The set of predicate symbols occurring in 
a program V is denoted Pred(P) or simply Pred if V is clear from the context. 
For a program V, lm{V) denotes its least model. For p G Pred(P) we define 

Ip]v = {t I p{t) G lm{V)}. 

A type is a set of terms closed under substitution [2] . A ground type is a set of 
ground terms (i.e., trees), and thus a special case of a type. A term t has type T, 
in symbols t:T, At £ T. A type judgment is an implication : Ti A . . . A : T„ — >■ 
to :'7o- We say that such a judgment holds if the implication ti9 £ Ti A . . . At„0 G 
Tn -A tod G To is true for all term substitutions 9 : Var — >• Tx'(. ..). 

We recall that a set of ground terms is regular if it can be defined by a finite 
tree automaton (or, equivalently, by a ground set expression as in [1] or a regular 
grammar as in [16]). The definition below coincides with the types used in [1], it 
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extends the definition from [16] by allowing non-ground types, and is equivalent 
to the definition from [5]. 

Definition 3 (Regular type). A type is regular if it is of the form Sat{T) for 
a regular set T of ground terms, where the set Sat(T) of terms satisfying T is 
the type 

Sat{T) = {t G T ^(. I Off) € T for all ground substitutions 0 : Var — >■ Ti;}. 



Definition 4 (Directional type of a program [8,1]). A directional type of 
a proqram V is a family 

r={ip^Op)p^.... 

assigning to each predicate p of V an input type Ip and an output type Op such 
that, for each clause po(to) <— Pi(ti), ■ ■ ■ ,Pnftn) of V, the following type judg- 
ments hold. 

fo ■ fpo ^ ■ fpi 

lo • fpo • f^p\ ^ ^2 ■ ^P2 

^0 ■ fpo A. t\ Op.^ A ... A tn—l • Op^_^ y tn ■ Ip„ 

^0 ■ fpo A t\ . Op^ A ... A tn . Op^ y to ■ Atpo 
We then also say that V is well-typed wrt. T. 

Following [1] we define that a query qi{t\), ... , g„(t„) is well-typed if for all 
1 < j the judgment Ai<fc<j — >■ tj : Iq^ holds. It is then easy to 

see that “well-typed programs do not go wrong” as defined in [31]. Namely, an 
application of one step of SLD-resolution to a well-typed query results always 
in a new well-typed query. This does not say, however, anything about whether 
the query succeeds, fails or loops. 

The definition above refers to the operational semantics of logic programs 
based on left to right execution. There is also a more declarative (cf. [32], see 
also Theorem 1) intuition behind it: Intuitively, the judgments say that if a 
query has the correct input type and its call terminates successfully, then the 
computed answer has the correct output type. 

Definition 5 (Type checking). The type-checking problem is to decide for a 
given program V and directional type T, whether V is well-typed wrt. T . 

A program can have many directional types. For example, consider the pre- 
dicate append defined by 

append{[], L, L). 

append{[X\X.s],Y, [A|Z]) ^ append{Xs,Y, Z). 

We can give this predicate the directional type {list, list, T) — >■ {list, list, list), 
where list denotes the set of all lists and T is the set of all terms, but also 
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{T,T,list) — >■ {list, list, list), as well as (T,T,T) — >• (T,T,T). (Recall that 
append is seen as a unary predicate here, and {list, list, list) is a set of terms 
which are triples of lists.) A predicate defined by a single fact p{X) has a direc- 
tional type T — >■ T for all types t . 

Example 2. We show that {list, list,T) — >■ {list, list, list) well-types the predi- 
cate append defined above. For the first clause, append{\\, L, L), we have to show 
only one judgment, namely 

append{\\,L,L) : {list, list,!') — >■ append{\\, L, L) : {list, list, list). 

The condition we have to check here is a tautology: the assumption that [] is a 
member of list implies that [] is a member of list, and the assumption that L is 
a member of both list and T implies that L is a member of list. For the second 
clause, append{[X\Xs],Y, [A|Z]) ^ append{Xs,Y, Z) we have two judgments: 

append{[X\Xs],Y,[X\Z]) : {list, list,T) — >■ append{Xs,Y, Z) : {list, list, T), 

and 

append{[X\Xs],Y,[X\Z]) : {list,list,T), append{Xs,Y, Z) : {list, list, list) 

— >■ append{[X\X s],Y, [A|Z]) : {list, list, list). 

The first one follows from the observation that if [A|As] is a list then As is a 
list. The second, from the observation that if Z is a list then [XjZ] is a list. 

A similar reasoning can be used to show that {{list, list, T) U (T, T, list)) — >■ 
{list, list, list) well-types append. Then both append{[a,b, X],[c], L) and 
append{X,Y, [a, 6, c]) are well-typed queries while append{[a], X,Y) is not. 



We do not use discriminative types in this paper. We include the definition 
below to show what the contribution of the paper is. The notion of a path-closed 
set below originates from [22]. It is equivalent to other notions occurring in the 
literature: tuple-distributive [30,35], discriminative [1], or deterministic. 

Definition 6 (Discriminative type). A regular set of ground terms is called 
path-closed if it can he defined by a deterministic top-down tree automaton. A 
directional type is called discriminative if it is of the form 

{Sat{Ip) -)> Sat{Op))p^...., 
where the sets Ip, Op are path-closed. 

A deterministic top-down tree automaton translates to a logic program which 
does not contain two different clauses with the same head (modulo variable 
renaming), e.g., p{f{xi , . . . ,x„)) ^ Pi{xi), . . . ,p„{xn) and p{f{xi , . . . ,x„)) <- 
p'i{xi), . . . ,p'„{xn)- A discriminative set expression as defined in [1] translates 
to a deterministic finite tree automaton, and vice versa. That is, discriminative 
set expressions denote exactly path-closed regular sets. It is argued in [I] that 




Directional Type Checking for Logic Programs 



79 



discriminative set expressions are quite expressive and are used to express com- 
monly used data structures. Note that lists, for example, can be defined by the 
program with the two clauses list{cons{x,y)) t— list{y) and list(nil). 

There are, however, many regular types which are not discriminative. The 
simplest is the set {f{a,a),f{b,b)}. Another simple example of a regular but not 
path-closed set is given in Example 2: it is the set consisting of triples (x,y,z) 
where either x and y are lists and z is any term or x and y are any terms and 
z is a list (which is useful for typing of the predicate append used either for 
concatenating of the lists x and y or for splitting the list z). 

The use of general regular types has also other advantages: it gives us over- 
loading for free. For example, if an operator like -I- is used in addition of both 
integers and reals, the corresponding automaton may have simply both transi- 
tions -l-(int, int) — >■ expr and -|-(real, real) — >■ expr. 

Further motivation for studying regular but not discriminative types co- 
mes from program verification. Several papers, including [13,23,34] modeled 
transition systems as logic programs. In many cases safety properties can be 
tested by type checking: it is enough to prove that some predicates have 
types of the form Goodstates — >■ Goodstates where Goodstates is a set 
which does not contain unsafe states. For example, if we reason about mu- 
tual exclusion of two concurrent processes, the set Goodstates could con- 
tain three terms: state{noncritical , noncritical), state{noncritical , critical) and 
state{critical,noncritical). However, any discriminative set containing both 
terms state{noncritical, critical) and state{critical, noncritical) must also con- 
tain the term state{critical, critical) and thus we cannot verify mutual exclu- 
sion within such a type system. It is known (cf. [13]) that regular (not limited 
to discriminative) sets can capture all temporal properties expressible in the lo- 
gic CTL for all finite systems as well as for some infinite ones, like pushdown 
or some parameterized systems. Since most model-checkers are limited to finite- 
state systems, there is a good potential for applications of the logic-programming 
approach to the infinite case. But to apply a type system for verification we need 
the full power of regular sets. 



3 Directional Type Checking 

In this section we prove that the directional type checking for logic programs 
wrt. general regular types is DEXPTIME-complete and fixed-parameter linear. 

We start with recalling a technique used in [12]. We transform the well- 
typedness condition in Definition 4 into a logic program VinOut by replacing 
t\Ip with the atom p^'^{t) and t\Op with p^“*{t). 

Definition 7 {VinOuti the type program for V). Given a program V , the 
eorresponding type program VinOut defines an in-predicate and an out- 
predicate for each predicate p of V. Namely, for every clause po(to) <~ 

Pi(ti), . . . ,Pn(tn) in 'P , PinOut Contains the n clauses defining in-predicates cor- 
responding to each atom in the body of the clause, 
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Pn”(^n) ^ Po”(^o), • • ■ , 

and the clause defining the out-predicate corresponding to the head of the clause, 

P^^\to) ^ plT{h), p?-\h), 



The program above is known in the literature as a magic-set transformation 
of the initial program V. It was used (among other things) to obtain more 
precise information about answers computed by the program if the queries are 
restricted to some specific form. If we denote by Vin a program that defines some 
pin predicates (intuitively, the queries to the program V are then restricted to 
those defined in the program Vin) then it is easy to observe that 

lp^^%vi„uvi„out = IpIv n 

which intuitively means that an atom p^'^*{t) is in the least model of the trans- 
formed program if and only if p(t) is in the least model of the initial program 
and p^^{t) is allowed as a query. 

The following theorem is proved in [12]. Essentially, it says that a directional 
type of the form T = (Sat{Ip) — >■ Sat{Op))p ^. ... , for ground types Ip, Op C T^, 
satisfies required type judgments if and only if the corresponding directional 
ground type Tg = {Ip ^ Op)p ^. ... does. 



Theorem 1 (Types and models of type programs). A program V is well- 
typed wrt. the directional type 



T = (Sat{Ip) — >• Sat{Op))p(z — 



(with ground types Ip, Op) if and only if the subset of the Herbrand base corre- 
sponding to T, 



Mr = {/”(t) \t€lp}\J {p^“‘(t) 1 1 G Op}, 
is a model of the type program VinOut ■ 

Note that the theorem above connects directional types with arbitrary mo- 
dels of the type program, not only with the least model. Since every clause in 
this program contains occurrences of predicates and there are no facts defi- 
ning these predicates, the least model is empty, which corresponds to the trivial 
directional type 0 — >■ 0 (and expresses that a program without input does not 
produce output). On the other extremity we have the whole Herbrand base, 
which is also a model of the type program and corresponds to the trivial type 
T ^ T. 
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3.1 Exponential Upper Bound 

Note that a subset of the Herbrand base is a model of a logic program if and only 
if it is a model of each clause of the program. Thus, as an immediate consequence 
of Theorem 1 above we obtain that the type-checking problem for directional 
types reduces to the following model-checking problem. 

Problem 1. Given a clause po(^o) ^ ■ ■ ■ ,Pn{tn) and a family of regular 

sets Tpg,Tp^, . . . ,Tp^, decide whether the set Ur=ote(^) I f G ?p.} is a model of 
the clause. 

The problem above is closely related to another problem known in the theory 
of tree automata (in particular, for reasoning about ground reducibility, see [15]), 
namely, if for a given term t and regular set T there exists a ground instance of t 
in T. We use similar techniques to prove its decidability. However, since we want 
to carefully analyze its complexity, we find it easier to present a direct proof 
rather than to find a suitable reduction. For the decidability proof we need the 
following lemma. 

Lemma 1. Let Ai = {S,Qi, Ai, Fi) for i = 0,...,n be tree automata with 
disjoint sets of states, and let ff ^ E be a fresh function symbol of arity n -I- 1. 
There exists a tree automaton A= {E\J {#}, Q, A, F) such that 

— A is bottom-up deterministic, and 

— all states of A are reachable, and 

— A recognizes the set ff{T^ — L{Atf) , C{Ai ) , . . . ,L{An)), and 

— A can be effectively constructed from Aq, . . . ,An in single exponential time. 

Proof. The idea of the proof below is to use standard complementation and 
determinisation methods to construct an automaton A' = {SVJ {#}, Q' , A' , F') 
that satisfies all conditions except reachability of states. The only problem here 
is that we have to complement and determinize at the same time to avoid a 
doubly-exponential blowup. Then we obtain A by removing non-reachable states 
from A' . The detailed construction is as follows. 

We can assume that Aq is a complete automaton, otherwise we can sim- 
ply add a new non- final state q (so-called “dead state”) to Qq and all possible 
transitions with q on the right-hand side to Aq. 

Let Q' = 2 '^ou...uQ„ y be tbe powerset of Qq U . . . U Qn plus one 

additional state Sfi„, which is the only final state of A', that is F' = For 

si, . . . , Sfe G Q' and fc-ary f € E we define that /(si, . . . , sQ — >■ s G Z\' if s is the 
set 

{<; G Qo U • • • U Qn I 3<7i G Si . . . 3qk G Sk f{qi, • ■ • , qQ 9 G Aq U . . . U Z\„}. 

For sq, . . . , Sn € Q' we define that #(so, . . . , Sn) — >■ Sh„ G A' if 

So n Fq = 0, Si n Fi yf 0, . . . , s„ n 0. 

Finally we define Q as the set of reachable states from Q' (it is well-known 
that reachability for tree automata can be tested in polynomial time), A as the 
restriction of A' to Q, and F as F' . 
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The correctness of the construction follows immediately from the observation 
that for i = Q, . . . ,n, the automaton 

^' = (27u{#},g,A{seQ|snT’,^0}) 

recognizes exactly the set C{Ai), and restricted to S is complete. □ 

Decidability of Problem 1. Let the clause po{to) Pi{ti) , ■ ■ ■ , Pn{tn) and 
the family of regular sets Tpg,TpD ■ • • be an instance of Problem 1. We 
did not specify here the formalism in which the sets Tp ^ , . . . , are given, but 
without loss of generality we can assume that the automata recognizing them 
are known. The translation from other formalisms like ground set expressions 
from [1] or regular grammars from [16] is straightforward. 

The idea of the proof is to test the emptiness of the intersection of the automa- 
ton constructed in Lemma 1 with the set of instances of the term , tn)- 

Due to non-linear occurrences of variables in , tn) this last set is, howe- 

ver, not regular. For our purposes it is enough, however, if we assign the same 
state of an automaton to each occurrence of the same variable. 

Lemma 2 . Let A = {S,Q, A, F) be a deterministic bottom-up tree automa- 
ton without unreachable states, recognizing #(?!;_{#} — Tq,Ti, . . . ,Tn), as con- 
structed in Lemma 1. Then the set \J^^Q{Pi{t) \ t € Tp^} is not a model of 
the clause po(^o) ^ Pi{ti), . . . ,Pn{tn) if and only if there exists a mapping 
9 : Var(#(to, ■ • ■ , tn)) — >■ Q such that the term . . . ,tn)9 is accepted by 

the automaton A. 

Proof. The above set is not a model of the clause if and only if there exists a 
substitution a : Var(#(to, ■ ■ ■ An)) — >■ such that tier G Tp^ , ■ ■ ■ , tn<J G Tp^ 

and t^a ^ Tp^. This is equivalent to the existence of such a a that the automaton 
A accepts the term fffto, . . . ,tn)<J- Thus it is enough to prove the equivalence 
of the last condition with the acceptance of #(to, . . . , tn)9 by A. 

Now we prove this equivalence. Suppose that A accepts #(to, . . . , t„)a with 
a run p. Note that by the determinism of A, there is only one possible run of 
A on #(to, • ■ • An)cr, and for each occurrence of xa the state assigned by p is 
the same, and thus we can speak about states assigned to terms (as opposed 
to occurrences of terms). Taking 9{x) = p{a{x)) we obtain p(ff{tQ, . . . ,t„)0) = 
p{ff{to , . . . , tn)(j) G F and the automaton accepts #(toj • ■ • j tn)9. 

Conversely, suppose there exists 6 such that ff {to, ■■■ ,tn)9 is accepted by A. 
Since all states in Q are reachable, there exists a tree tx accepted by the state 
9{x). Putting a{x) = tx for all x G Var(#(to, • • • , tn)) we obtain a a such that A 
accepts the term ff{to,...,tn)cr. □ 

Theorem 2 . Problem 1 is decidable in DEXPTIME. 

Proof. This is a consequence of the Lemma 2 above: there are IQjl' "(#(*o.. .,tn))| 
possible mappings 9; this number is exponential in the size of the input, since 
IQI is exponential and |Var(#(to, • • • , tn))| is linear; each such 9 can be tested in 
polynomial time. □ 

The following corollary is a direct consequence of Theorems 1 and 2, and the 
DEXPTIME-hardness result from [12]. 
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Corollary 1. Directional type checking for logic programs wrt. arbitrary regular 
types is DEXPTIME-complete. 

3.2 Parameterized Complexity of Type Checking 

Let us recall the DEXPTiME-hardness proof for the type-checking of logic pro- 
grams wrt. discriminative types. It is based on a reduction from the emptin- 
ess problem for intersection of deterministic top-down tree automata [37]. It is 
shown that the program consisting of a single clause p{X , . . . , X) is well-typed 
wrt. (Ti, . . . , T„) — >• 0 if and only if the intersection Ti fl . . . fl T„ is empty. For 
the hardness proof the sets Ti , . . . , T„ are chosen as discriminative regular sets of 
trees, whose intersection encodes computation of an alternating Turing machine 
with polynomialy bounded tape. 

What strikes in this construction is that the program used here is very short 
(it is only one fact) while the types are very large (the encoding of the Turing- 
machine computation is done in the types, not in the program). The situation 
in practice looks usually quite opposite: the type-checking problems are usually 
applied to large programs and rather small types. A natural way to approach 
such problem is to study its parameterized complexity [18]. 

A parameterized problem takes as input a pair {x, k) where a; is a word (in 
our case the encoding of a logic program and a directional type) and fc is a 
positive integer. Such a problem is called fixed-parameter linear if there exists a 
function / : N — >■ N and an algorithm that decides the problem and runs in time 
f{k)\x\. 

In the formulation of the problems below, |T| denotes the size of T. Formally, 
it is the sum of the lengths of the encodings of the automata recognizing the 
regular sets occurring in T. Similarly, |T| denotes the size of T (the length of 
the encoding of the automaton recognizing T). 

Problem 2 (Parameterized type-checking) . 

Instance: a logic program V and a directional type T 
Parameter: |T| 

Question: is V well- typed wrt. T? 

Theorem 3. The parametrized type-checking problem is decidable in time 
0(cl^l • \V\) for some constant c that does not depend on V . 

The proof of this theorem follows directly from Lemma 3 

Problem 3 (Parametrized version of Problem 1). 

Instance: a clause po(^o) ^ Pi{t\), ■ ■ ■ ,Pn{tn) and a family of regular sets 

TpQ , Tp^ , ■ . ■ , 

Parameter: 

Question: is the set Ur=o{P»(^) I ^ ^ ^ model of the clause? 

Lemma 3. Problem 3 is decidable in time 0{c^m) for some constant c that does 
not depend on m, where k is the parameter and m is the size of the clause. 
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Proof. The idea is again to use Lemma 2. We use notations from Lemmas 1 
and 2. We traverse the term #(to> ■ ■ ■ ,tn) top-down checking which assumptions 
have to be made on the value of the run on subterms to make the term accepted 
by the automaton. These checks will succeed if the assumptions about the run 
on variables give raise to a function from variables to states of the automaton. 

Consider a set of pairs of the form {(gi, si), . . . , {qk, Sfc)} where qi is a state 
of the automaton A and Si is a subterm of . . . ,tn)- Intuitively, this set 

will express the information “if there exists a run p of the automaton A such 
that p{si) = qi then A accepts #(to) ■ • ■ An)”- We call such a set S flat if all 
terms occurring in S are variables. A flat set S is inconsistent if it contains two 
different pairs {q, x) and {q' , x) with the same variable x and different states q, q'; 
otherwise it is consistent. A flat and consistent set defines a function assigning 
states to variables. 

Consider a function check assigning a boolean value to such sets of pairs, 
defined recursively as follows. 



check(S') = < 



true, 

false, 

/(?i 



\f check{S - {{q,f{si. 



if S flat and consistent 
if S flat and inconsistent 
.Sfe))}U {(<?!, si),...,(gfc,Sfc)}), 

otherwise 



We claim that 

1. check({(sfi„, #(to, • ■ • , ^n))}) = true if and only if there exists a function 
9 : Var(#(to, ■ ■ ■ ,tn)) — >■ Q such that the term #(to> • ■ • , tn)d is accepted by 
the automaton A. 

2. the value of check({(sfi„, #(to, • ■ • , ^n))}) can be computed in time 
omto,...,tn)\-\A\) 

The first part can be easily proved by induction on the structure of the term 
. . . ,tn). The run of the automaton must assign the final state Sfi„ to the 
term . . . ,tn), which is expressed by the pair fflfo, . . . ,tn)); each com- 
putation step of the automaton must agree with some transition in A, which 
is expressed by the disjunction over matching transitions in the definition of 
check; finally the condition that 0 is a function is expressed by the consistency 
of the set S. Note also that by the associativity and commutativity of disjunc- 
tion, the value of check does not depend on the choice of the non-flat element 
{q,f{si, ■ ■ -,Sn)) from S. 

For the second part, note that for each subterm s of #(to> ■ ■ ■ An) there are 
at most |Z\| calls to the function check that correspond to decomposing of the 
term s. Since there are exactly \ff{to, . . . ,tn)\ such subterms, the whole work is 
done in time O(|#(to, ■ ■ ■ An)\ ■ |A|). □ 

3.3 Incrementality and Infinite Signature 

A literal application of the algorithm presented above might lead to the following 
problem. Suppose that some program is well- typed and we increment it by adding 
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a new, completely independent, fragment defining a new predicate. The new 
fragment may contain new function symbols, which did not occur in the original 
program. Since a signature is a part of the definition of a tree automaton, the 
old type-check was done with automata over smaller signature, and one could 
argue that now the type-checking procedure has to be rerun from scratch. 

However, it is fairly straightforward to extend tree automata to deal with 
infinite signature. We can simply consider an infinite signature Smf containing 
S, add a new state to the automaton and say that the transition relation A 
implicitly contains all transitions of the form /(. . .) — >■ for all / G Sinf- Such 

an automaton has still finite set of states and infinite (but finitely representable) 
transition relation. 

With such an extension of tree automata our algorithm is still correct (a 
little bit of work has to be done to correctly reason about implicit transitions 
and reachable states during the determinization step); it is still fixed-parameter 
linear (when traversing the term #(to> • ■ • ; tn) there is no need to look at function 
symbols that do not occur in this term). 

Another problem of the same nature is that numbers (integers or reals) do 
not form a regular set. In order to extend tree automata to deal with these sets 
it is enough to treat each number as a constant symbol, add two states int and 
real and infinitely many implicit transitions t — >■ int and r — ^ real for all integers i 
and reals r. 

4 Conclusion 

We proved the decidability in Dexptime and fixed-parameter linearity of 
directional- type checking for logic programs wrt. general regular types. This 
solves a problem that was open since 1994 and improves several earlier partial 
solutions. 

The procedure we presented is optimal from the complexity point of view, it 
is also incremental. This, together with linear complexity in the size of program 
gives us a hope that the type system may be usable in practice. 

There are some obvious directions for the future work. One is the implemen- 
tation of the system to see how it behaves in practice. Further, an extension to 
constraint logic programming, negation etc. would be interesting. The extension 
to polymorphic types seems not to be very difficult. 
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Abstract. We present the first formalization of implementation stra- 
tegies for first-class continuations. The formalization hinges on abstract 
machines for continuation-passing style (CPS) programs with a special 
treatment for the current continuation, accounting for the essence of 
hrst-class continuations. These abstract machines are proven equivalent 
to a standard, substitution-based abstract machine. The proof techni- 
ques work uniformly for various representations of continuations. As a 
byproduct, we also present a formal proof of the two folklore theorems 
that one continuation identifier is enough for second-class continuations 
and that second-class continuations are stackable. 

A large body of work exists on implementing continuations, but it is 
predominantly empirical and implementation-oriented. In contrast, our 
formalization abstracts the essence of first-class continuations and provi- 
des a uniform setting for specifying and formalizing their representation. 



1 Introduction 

Be it for coroutines, threads, mobile code, interactive computer games, or com- 
puter sessions, one often needs to suspend and to resume a computation. Suspen- 
ding a computation amounts to saving away its state, and resuming a suspended 
computation amounts to restoring the saved state. Such saved copies may be 
ephemeral and restored at most once (e.g., coroutines, threads, and computer 
sessions that were ‘saved to disk’), or they may need to be restored repeatedly 
(e.g., in a computer game). This functionality is reminiscent of continuations, 
which represent the rest of a computation [22] . 

In this article, we consider how to implement first-class continuations. A 
wealth of empirical techniques exist to take a snapshot of control during the 
execution of a program (call/cc) and to restore this snapshot (throw): SML/NJ, 
for example, allocates continuations entirely in the heap, reducing call/cc and 
throw to a matter of swapping pointers [1]; T and Scheme 48 allocate conti- 
nuations on a stack, copying this stack in the heap and back to account for 
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call/cc and throw [16,17];^ and PC Scheme, Chez Scheme, and Larceny allocate 
continuations on a segmented stack [2,4,15]. Clinger, Hartheimer, and Ost’s re- 
cent article [4] provides a comprehensive overview of implementation strategies 
for first-class continuations and of their issues: ideally, first-class continuations 
should exert zero overhead for programs that do not use them. 

Our goal and non-goal: We formalize implementation strategies for first-class 
continuations. We do not formalize first-class continuations per se (cf., e.g., Fell- 
eisen’s PhD thesis [12] or Duba, Harper, and MacQueen’s formal account of 
call/cc in ML [10]). 

Our work: We consider abstract machines for continuation-passing style (CPS) 
programs, focusing on the implementation of continuations. As a stepping stone, 
we formalize the folklore theorem that one register is enough to implement 
second-class continuations. We then formalize the three implementation tech- 
niques for first-class continuations mentioned above: heap, stack, and segmented 
stack. The formalization and its proof techniques (structural induction on terms 
and on derivation trees) are uniform: besides clarifying what it means to im- 
plement continuations, be they second-class or first-class, our work provides a 
platform to state and prove the correctness of each implementation. Also, this 
platform is not restricted to CPS programs: through Flanagan et al.’s results [13], 
it is applicable to direct-style programs if one represents control with a stack of 
evaluation contexts instead of a stack of functions. 



1.1 Related Work 

The four works most closely related to ours are Clinger, Hartheimer, and Ost’s 
overview of implementation strategies for first-class continuations [4]; Flana- 
gan, Sabry, Duba, and Felleisen’s account of compiling with continuations and 
more specifically, their two first abstract machines [13]; Danvy and Lawall’s 
syntactic characterization of second-class and first-class continuations in CPS 
programs [8]; and Danvy, Dzafic, and Pfenning’s work on the occurrence of con- 
tinuation parameters in CPS programs [6,9,11]. 



1.2 Overview 

Section 2 presents our source language: the A-calculus in direct style and in CPS, 
the CPS transformation, and an abstract machine for CPS programs that will be 
our reference point here. This standard machine treats continuation identifiers on 
par with all the other identifiers. The rest of this article focuses on continuation 
identifiers and how to represent their bindings - i.e., on the essence of how to 
implement continuations. 

^ This strategy is usually attributed to Drew McDermott in the late 70’s [19], but 
apparently it was already considered in the early ’70s at Queen Mary and Westfield 
College to implement PAL (John C. Reynolds, personal communication, Aarhus, 
Denmark, fall 1999). 
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Section 3 addresses second-class continuations. In a CPS program with second- 
class continuations, continuation identifiers are not only linear (in the sense of 
Linear Logic), but they also denote a stackable resource, and indeed it is fol- 
klore that second-class continuations can be implemented LIFO on a “control 
stack”. We formalize this folklore by characterizing second-class continuations 
syntactically in a CPS program and by presenting an abstract machine where 
the bindings of continuation identifiers are represented with a stack. We show 
this stack machine to be equivalent to the standard one. 

Section 4 addresses first-class continuations. In a CPS program with first- 
class continuations, continuation identifiers do not denote a stackable resource 
in general. First-class continuations, however, are relatively rare, and thus over 
the years, “zero-overhead” implementations have been sought [4]: implementa- 
tions that do support first-class continuations but only tax programs that use 
them. We consider the traditional strategy of stack-allocating all continuations 
by default, as if they were all second-class, and of copying this stack in case 
of first-class continuations. We formalize this empirical strategy with a new ab- 
stract machine, which we show to be equivalent to the standard one. 

Section 5 outlines how to formalize alternative implementation strategies, 
such as segmenting the stack and recycling unshared continuations. 



2 CPS Programs 

We consider closed programs: direct-style (DS) A-terms with literals. The BNF 
of DS programs is displayed in Figure 1. Assuming a call-by-value evaluation 
strategy, the BNF of CPS programs is displayed in Figure 2. CPS programs 
are prototypically obtained by CPS-transforming DS programs, as defined in 
Figure 3 [7,20,21]. 

Figure 4 displays our starting point: a standard abstract machine implemen- 
ting /3-reduction for CPS programs. This machine is a simplified version of ano- 
ther machine studied jointly with Belmina Dzafic and Frank Pfenning [6,9,11]. 
We use two judgments, indexed by the syntactic categories of CPS terms. The 
judgment 



h 



CProg 

std 



p ^ a 



is satisfied whenever a CPS program p evaluates to an answer a. The auxiliary 
judgment 

e a 

is satisfied whenever a CPS expression e evaluates to an answer a. The machine 
starts and stops with the initial continuation kinit, which is a distinguished fresh 
continuation identifier. Answers can be either the trivial expressions £ or Xx.Xk.e, 
or the error token. 

For expository simplicity, our standard machine uses substitutions to imple- 
ment variable bindings. Alternatively and equivalently, it could use an environ- 
ment and represent functional values as closures [18]. And indeed Flanagan et 
al. present a similar standard abstract machine which uses an environment [13, 
Figure 4]. 
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p G DProg 


— DS programs 


p ::= e 


e G DExp 


— DS expressions 


e eo ei \ t 


t € DTriv 


— DS trivial expressions 


t ::= £ \ X \ Xx.e 


£ G Lit 


— literals 




X G He 


— identifiers 






Fig. 1. BNF of DS 


programs 



p G CProg 


— CPS programs 


p 


Afc.e 


e G CExp 


— CPS (serious) expressions 


e 


totlC 1 ct 


t G CTriv 


— CPS trivial expressions 


t ::= 


£ \ X \ V \ A*. Afc.e 


c G Cont 


— continuations 


c ::= 


\v.e 1 fc 


£ G Lit 


— literals 






X G He 


— source identifiers 






k G HeC 


— fresh continuation identifiers 






V G HeV 


— fresh parameters of continuations 




a G Answer 


— CPS answers 


a :: = 


£ 1 A®. Afc.e 1 error 




Fig. 2. BNF of CPS programs 



“ where k is fresh 

IIeoei]°®^Pc = .i>o vi c - where vo and vi are fresh 

M?p?‘'' = i 
I*I?P™'' = * 

lAx.ejPp"^"' = Aa;.AA:.|e|?p®’'‘’fc - where k is fresh 

Fig. 3. The left-to-right, call-by-value CPS transformation 



4hnit/k] ^ a 



std 



€ f c ^ error 



c/fc] g 

'“Sd"''' {^x.\k.e)tc ^ a 






'“rtd ^ ^init ^ ^ ^ 



Fig. 4. Standard machine for CPS programs 
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3 A Stack Machine for CPS Programs with Second-Class 
Continuations 

As a stepping stone, this section formalizes the folklore theorem that in the ab- 
sence of first-class continuations, one continuation identifier is enough, i.e., in 
Figure 2, IdeC can be defined as a singleton set. To this end, we prove that in 
the output of the CPS transformation, only one continuation identifier is indeed 
enough. We also prove that this property is closed under arbitrary /3-reduction. 
We then rephrase the BNF of CPS programs with IdeC as a singleton set (Sec- 
tion 3.1). In the new BNF, only CPS programs with second-class continuations 
can be expressed. We present a stack machine for these CPS programs and we 
prove it equivalent to the standard machine of Figure 4 (Section 3.2). Flanagan 
et al. present a similar abstract machine [13, Figure 5], but without relating it 
formally to their standard abstract machine. 

3.1 One Continuation Identifier is Enough 

Each expression in a DS program occurs in one evaluation context. Correspon- 
dingly, each expression in a CPS program has one continuation. We formalize 
this observation in terms of continuation identifiers with the judgment defined in 
Figure 5, where FC(t) yields the set of continuation identifiers occurring free in t. 

Definition 1 (Second-class position, second-class continuations). In 

a continuation abstraction Xk.e, we say that k occurs in second-class position 
and denotes a second-class continuation whenever the judgment k e is 

satisfied. 

Below, we prove that actually, in the output of the CPS transformation, all 
continuation identifiers denote second-class continuations. In Figure 6, we thus 
generalize our judgment to a whole CPS program. 

Definition 2 (2Cont- validity). We say that a CPS program p is 2Cont-valid 
whenever the judgment P is satisfied. Informally, P holds if and 

only if all continuation abstractions Xk.e occurring in p satisfy k e. 

Lemma 1 (The CPS transformation yields 2Cont- valid programs). 

For any p G DProg, blip's™®- 

Proof. A straightforward induction over DS programs. □ 



k (f FC(to) 



k (jL FC(ti 



7 , iCont 
P2cc 



I, iCont 
^ P2cc 



k (f FC(t) 



N2?c^" to he 






CExp 

2cc 



Ct 






CExp 

2cc 



e 



k 



I Cont 

l~2cc 



Xv.e 



k k 



Fig. 5. Characterization of a second-class continuation abstraction Xk.e 





Formalizing Implementation Strategies for First-Class Continuations 



93 



. iCBxp 

^ C2cc* ® 




1 CTriv . 1 CTriv . j, \ Cont _ 

H2cc* *0 C2cc* ^ C2cc* ^ 


X 1 Cont 1 CTriv , 

C2cc* C2cc* ^ 


k ^0 ti c 


k lCBxp , 

^ C2cc* 

1 iCExp 

C2cc* ® 


1 CTriv f> |_CTriv ^ I CTriv 

2cc* 2cc* 1 2cc* 


V Xx.Xk.e 


, 1 CExp 

^ C2cc* ® 




k Xv.e 


u 1 Cont I. 

^ ^2cc* ^ 


Fig. 6. Characterization of a CPS program with second-class continuations 



Furthermore, 2Cont-validity is closed under /3-reduction, which means that it 
is preserved by regular evaluation as well as by the arbitrary simplifications 
of a CPS compiler [21]. The corresponding formal statement and its proof are 
straightforward and omitted here: we rely on them in the proof of Theorem 1. 

Therefore each use of each continuation identifier k is uniquely determined, 
capturing the fact that in the BNF of 2Cont- valid CPS programs, one continua- 
tion identifier is enough. To emphasize this fact, let us specialize the BNF of 
Figure 2 by defining IdeC as the singleton set {*}, yielding the BNF of 2CPS 
programs displayed in Figure 7. 



p G 2CProg 


— 2CPS programs 


p ::= X-k.e 


e G 2CExp 


— 2CPS (serious) expressions 


e :.= 1 ct 


t G 2CTriv 


— 2CPS trivial expressions 


t ::= £ \ X \ V \ Xx.X-k.e 


c G 2Cont 


— continuations 


c ::= Xv.e \ * 


£ G Lit 


— literals 




X G He 


— source identifiers 




* G Token 


— single continuation identifier 




V G IdeV 


— fresh parameters of continuations 


a G 2Answer 


— 2CPS answers 


a ::= £ \ Xx.Xir.e \ error 




Fig. 7. BNF of 2CPS 


programs 



Let denote the straightforward homomorphic mapping from a 2Cont- 

valid CPS program to a 2CPS program and I'lnam™® denote its inverse, such that 
Vp G CProg, [[p]ftrip°®lname°® =a P whenever the judgment h2cc*°*^ P i® satisfied, 
and Vp' G 2CProg, ||p]nSie°®]Srip°* = P' ■ These two translations are generalized 
in Section 4 and thus we omit their definition here. 

3.2 A Stack Machine for 2CPS Programs 

Figure 8 displays a stack-based abstract machine for 2CPS programs. We ob- 
tained it from the standard machine of Section 2, page 91, by implementing the 
bindings of continuation identifiers with a global “control stack” p. 
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(/? G 2CStack — control stacks (f ::= • \ ip, Xv.e 

The machine starts and stops with an empty control stack •. When a function 
is applied, its continuation is pushed on (p. When a continuation is needed, it 
is popped from p. If p is empty, the intermediate result sent to the continua- 
tion is the final answer. We distinguish tail calls (i.e., function calls where the 
continuation is *) by not pushing anything on p, thereby achieving proper tail 
recursion. 



i_ 2 CExp 

•^ 2 cc 




p b2c<f it c ^ error 


p e[t/x] ^ a 


p, Xv.e' e[t/x] ^ a 


p (Xx.X-k.e) t* ^ a 


p l“2c(f^^ (Xx.X*.e)t Xv.e' ^ a 


p e[t/v] ^ a 


p 1-2™’"’ e\t/v] a 


pV-lT'^{Xv.e)t^a 


■kt ^ t p, Xv.e l“2™’'^ -kt ^ a 


Fig. 8. Stack machine for 2CPS programs 



N.B. The machine does not substitute continuations for continuation identifiers, 
and therefore one might be surprised by the rule handling the redex (Xv.e) t. 
Such redexes, however, can occur in the source program. 

Formally, the judgment 



l_2CProg 

'~2cc 



p ^ a 



is satisfied whenever a CPS program p G 2CProg evaluates to an answer a G 
2 Answer. The auxiliary judgment 

i_2CExp . 

^ 2 cc ^ a 

is satisfied whenever an expression e G 2CExp evaluates to an answer a, given a 
control stack p G 2CStack. 

We prove the equivalence between the stack machine and the standard ma- 
chine by showing that the computations for each abstract machine (represented 
by derivations) are in bijective correspondence. To this end, we define a “control- 
stack substitution” over the state of the stack machine (i.e., expression under 
evaluation and current control stack) to obtain the state of the standard ma- 
chine (i.e., expression under evaluation). We define control-stack substitution 
inductively over 2CPS expressions and continuations. 

Definition 3 (Control-stack snbstitution for 2CPS programs). Given a 
stack p o/2Cont continuations, the stack substitution of any e G 2CExp (resp. 
c G 2Contj, noted e{p \2 (resp. c{p\ 2 ), yields a CExp expression (resp. a Cont 
continuation) and is defined as follows. 
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{to h C){<p}2 = {cWh) 

{ct)W}2 = (cWh) WS'' 



{Xv.e){(f}2 = \v.{e{<f\2) 
^{•}2 — kinit 
\v.e\2 = \v.{e{ip}2) 



Stack substitution is our key tool for mapping a state of the stack machine 
into a state of the standard machine. It yields CExp expressions and Cont con- 
tinuations that have one free continuation identifier: /tinit- 



Lemma 2 (2 Cont- validity of stack-substituted expressions and conti- 
nuations). 

1. For any e € 2CExp and for any stack o/2Cont continuations ip, the judgment 

kinit e{p }2 is satisfied. 

2. For any c G 2Cont and for any stack of 2Cont continuations p, the judgment 

kinit c{p }2 is satisfied. 

Proof. By mutual induction on the structure of e and c. □ 



Lemma 3 (Control-stack substitution for 2CPS programs). 

1. For any e' G CExp satisfying k some k and for any stack of 

2Cont continuations p, = e' [*{(/? } 2 /fc] • 

2. For any e G 2CExp, for any t' G CTriv satisfying t' , for any iden- 

tifier i in Ide or in IdeV, and for any stack of 2Cont continuations p, 

= e{pUt'/i]- 

Theorem 1 (Simulation). The stack machine of Figure 8 and the standard 
machine are equivalent: 

1. For any 2Cont-valid CPS program p, 

a if and only ^ 

2. For any CPS expression e satisfying k c some k and for any stack 

o/2Cont continuations p, 

Ie]ftrip'"{‘b }2 ^ a if and only Ifp^lT""^ [elS^ Hftrip"'^- 

Proof. The theorem follows in each direction by an induction over the structure 
of the derivations, using Lemma 3. Let us show the case of tail calls in one 
direction. 

^ ^ bL?"'’ e[t/x] ^ 

Case c = C . 

(Ax.A*.e)t*^[alXwer 

where Si names the derivation ending in p e[t/a:] ^ 

By applying the induction hypothesis to Si, we obtain a derivation 



S'l 

4t/x]{ph ^ a 




96 



O. Danvy 



Since e[t/x] is a 2CPS expression, there exists a CPS expression e' satisfying 
k |= 2 cc*^ some k and there exists a CPS trivial expression t' satisfying 

hS'*'' t' such that e = le'l^^pP and t = plSrip''- 
By Lemma 3, 

= Ie']SHp7</^}2[t7^] 

= e'H^},/k][t'/x] 

= e'[t' /x, 'k{(p\ 2 /k\ - because t' has no free k 

and Lp has no free x. 

By inference, 

e'[t'/x, ^{(ph/k] ^ a 
(Ax.A/c.e') t' (*{(/?} 2 ) ^ a 
Now by definition of stack substitution, 

{Xx.Xk.e') t' = l(Ax.Afc.e) t - for some k' . 

In other words, there exists a derivation 

£' 

[e[7a:]lftripP{y>}2 ^ a 
I(Aa;.AA:.e)tfc']®^pP{(p }2 a 

which is what we wanted to show. □ 

3.3 Summary and Conclusion 

As a stepping stone towards Section 4, we have formalized and proven two 
folklore theorems: (1) for CPS programs with second-class continuations, one 
identifier is enough; and (2) the bindings of continuation identifiers can be im- 
plemented with a stack for CPS programs with second-class continuations. To 
this end, we have considered a simplified abstract machine and taken the same 
conceptual steps as in our earlier joint work with Dzafic and Pfenning [6,9,11]. 
This earlier work is formalized in Elf, whereas the present work is not (yet). 
The rest of this article reports an independent foray. In the next section, we ad- 
apt the stack machine to CPS programs with first-class continuations, thereby 
formalizing an empirical implementation strategy for first-class continuations. 

4 A Stack Machine for CPS Programs with First-Class 
Continuations 

First-class continuations occur because of call/cc. The call-by- value CPS trans- 
formation of call/cc reads as follows. 

|call/cc = |e]J;^7^A/./ (Xx.Xk.cx) c - where /, x, and k are fresh. 

On the right-hand-side of this definitional equation, c occurs twice: once as a 
regular, second-class continuation, and once more, in Xx.Xk.e x. In that term, k 
is declared but not used - c is used instead and denotes a first-class continuation. 




Formalizing Implementation Strategies for First-Class Continuations 



97 



Such CPS programs do not satisfy the judgments of Figures 5 and 6. And indeed, 
Danvy and Lawall observed that in a CPS program, first-class continuations 
can be detected through continuation identifiers occurring “out of turn”, so to 
speak [8]. 

Because it makes no assumptions on the binding discipline of continuation 
identifiers, the standard machine of Section 2, page 91, properly handles CPS 
programs with first-class continuations. First-class continuations, however, dis- 
qualify the stack machine of Section 3, page 94. 

The goal of this section is to develop a stack machine for CPS programs with 
first-class continuations. To this end, we formalize what it means for a conti- 
nuation identifier to occur in first-class position. We also prove that arbitrary 
/3-reduction never promotes a continuation identifier occurring in second-class 
position into one occurring in first-class position. We then rephrase the BNF 
of CPS programs to single out continuation identifiers occurring in first-class 
position and their declaration. And similarly to Section 3, we tag with all 
the declarations of continuation identifiers occurring in second-class position or 
not occurring at all, and all second-class positions of continuation identifiers 
(Section 4.1). We then present a stack machine for these ICPS programs that 
copies the stack when first-class continuation abstractions are invoked. We prove 
it equivalent to the standard machine of Figure 4 (Section 4.2). 

4.1 One Continuation Identifier is Not Enough 

Following Danvy and Lawall [8], we now say that a continuation identifier occurs 
in first-class position whenever it occurs elsewhere than in second-class position, 
which is syntactically easy to detect. We formalize first-class occurrences with 
the judgment displayed in Figure 9. 



k £ FC(to) 


k e FC(ti) 


k Nfr c 


^ Hlcc toil c 


k to ti c 


k io ti c 


k |=fcT‘ c 


k € FC(t) 




k ct 








k e 






k h=?cT 




Fig. 9. Characterization of 


a first-class continuation abstraction Xk.e 



Definition 4 (First-class position, first-class continuations). In a conti- 
nuation abstraction Xk.e, we say that k occurs in first-class position and denotes 
a first-class continuation whenever the judgment k e is satisfied. 

N.B. For any continuation abstraction Xk.e, at most one of k e and 

k 6 is satisfied. 
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In Section 3, we stated that 2Cont-validity is closed under /3-reduction. Si- 
milarly here, /3-reduction may demote a first-class continuation identifier into a 
second-class one, but it can never promote a second-class continuation identi- 
fier into a first-class one. The corresponding formal statement and its proof are 
straightforward and omitted here: we rely on them in the proof of Theorem 2. 
For example, in 

\k.{\x.\k' .k x) (.k 

k occurs in first-class position. However, /3-reducing this term yields 

Xk.ke 

where k occurs in second-class position. 

In Section 3, we capitalized on the fact that each second-class position was 
uniquely determined. Here, we still capitalize on this fact by only singling out 
continuation identifiers in first-class position.^ 

Introduction: For all continuation abstractions Afc.e satisfying k e, we 

tag the declaration of k with and we keep the name k. Otherwise, we 
replace it with *. 

Elimination: When a continuation identifier occurs, if it is the latest one decla- 
red, we replace it with *; otherwise, we keep its name. 

The resulting BNF for ICPS programs is displayed in Figure 10. The back and 
forth translation functions are displayed in Figures 11 and 12. They generalize 
their counterpart in Section 3. 

Lemma 4 (Inverseness of stripping and naming). 

Vp G CProg, P and W G ICProg, ® = P' ■ 



4.2 A Stack Machine for CPS Programs with First-Class 
Continnations 

We handle first-class continuations by extending the formalization of Section 3 
with a new syntactic form: 

cG ICont — continuations c ::= Xv.e \ * | k \ swapp 

The new form swap tp makes it possible to represent a copy of the control stack 
(fi. It requires us to extend control-stack substitution as follows. 

Definition 5 (Control-stack snbstitution for ICPS programs). Given a 
stack ip of ICont continuations, The stack substitution of any e G ICExp (resp. 
c G ICont noted e{{p}i (resp. c{{p}i), yields a CExp expression (resp. a Cont 
continuation) and is defined as follows. 



{toh c){p}i = (c{/?}i) 

(ct){p}i = (c{p}i) 



{Xv.e){(p}i = Xv.{e{ip}i) 
^{•}l — kinit 
-k{ip, Xv.e}i = Xv.{e{ip}i) 
fc{p}i = k 

(swapp'){'P}i = 



^ Andrzej Filinski suggested this concise notation (personal communication, Aarhus, 
Denmark, summer 1999). 
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p e ICProg 


— ICPS programs p ;: = 


A^.6 I 


A^fc.e 


e G ICExp 


— ICPS (serious) expressions e ::= 


to t\ c 


ct 


t G ICTriv 


— ICPS trivial expressions t ::= 


1 1 a; 


V 1 Xx.X-k.e 1 Xx.)yk.e 


c G ICont 


— continuations c ::= 


\v.e 1 


★ 1 k 


i G Lit 


— literals 






X G Me 


— source identifiers 






k G IdeC 


— fresh continuation identifiers 






* G Token 


— single continuation identifier 






V G IdeV 


— fresh parameters of continuations 






a G lAnswer 


— ICPS answers a :: = 


1 1 \x.\-k.e 1 Ax.A^fc.e | error 




Fig. 10. BNF of ICPS programs 





[Afc.eisrir = ^ 


lA*.Ie]“-fc 


otherwise 


= [tolS™" M?tl7 


= i 

WSrip'' = a: 
lv& = V 


[cilSrip'fc = {lc]?tXk) 


lXx.Xk.ej‘S!7 = \ 


'Xx.X^k.lelTr/k 

A*.A*.[e]S^:'>fc 


iffeN^c^ e 

otherwise 


[Au.elS?“‘fc = Au.IelSfi^fc 




Wl?tXk = \ 


* if fc = fc' 

k' otherwise 




Fig. 11. Translation from CPS to ICPS - stripping continuation identifiers 



- where k is fresh 



[to ii (m 

= ([c]i°r fc) wiSr 



iCont 

name 



k) 



= i 
= * 

bliame"' = V 



[A®.A*.e]Jiame'' = A® . Afc. [c] fc - where k is fresh 

[A^x.Afc.eJiSme'' = Aa;.Afc.i[e]ji^®JPfc 



IIAu.eIi°“‘fc = Au.[e]li°®e'’fc 

WiSr = k 

Wt^S^l^k = fc' 



[^1 

[A®.A*.e] 

[A^x.Afc.e] 

[errorl 



lAnswer 

name 

lAnswer 

name 

lAnswer 

name 

lAnswer 

name 



= £ 

= A®.Afc.[eIn°®e‘’fc 

= Ax.AA:.[ejiame'’fc 
= error 



- where k is fresh 



Fig. 12. Translation from ICPS to CPS - naming continuation identifiers 
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• e ^ a 


• e[swap»/fc] ^ a 






‘P 


^ error 




p, Xv.e' e[t/x] ^ a 


p (Ax.A*.e)t* ^ a 


p {Xx.X*.e)t Xv.e' ^ a 


p e[t/x, svj&pp/k] ^ a 


p, Xv.e' e[t/x, swap (y>, Xv.e')/k] ^ a 


p (Xx.}^k.e)t* a 


p (Ax.A^fc.e) t Au.e' ^ a 


p' e\t!x\ ^ a 


p' e[t/x, sw&pp' /k] ^ a 


p (Ax.A*.e)t (swapy') 


^ a p (Ax.Aifc.e) t (swap y>') ^ a 


P e[t/u] a 


v> e[t/w] a 


p (Aw.e) a 


• ^ t p, Xv.e -kt^ a 






p swap • t ^ t 


p swap (ip', Xv.e) t ^ a 


Fig. 13. Stack machine for ICPS programs 



Figure 13 displays a stack-based abstract machine for ICPS programs. This 
machine is a version of the stack machine of Section 3 where the substitution 
for continuation identifiers occurring in second-class position or not occurring 
at all is implemented with a global control stack (as in Figure 8), and where 
the substitution for continuation identifiers occurring in first-class position is 
implemented by copying the stack into a swap form (which is new). 

Calls: When a function declaring a second-class continuation is applied, its con- 
tinuation is pushed on ip. When a function declaring a first-class continuation 
is applied, its continuation is also pushed on ip and the resulting new stack 
is copied into a swap form. 

Returns: When a continuation is needed, it is popped from p. If p is empty, 
the intermediate result sent to the continuation is the final answer. When a 
swap form is encountered, its copy of p is restored. 



More formally, the judgment 



ilCProg 

^Icc 



p ^ a 



is satisfied whenever a CPS program p G ICProg evaluates to an answer a G 
lAnswer. The auxiliary judgment 



ilCExp 

v'^lcc 



e ^ a 



is satisfied whenever an expression e G ICExp evaluates to an answer a, given a 
control stack p G ICStack. The machine starts and stops with an empty control 
stack. 
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We prove the equivalence between the stack machine and the standard ma- 
chine as in Section 3.2. 

Theorem 2 (Simulation). The stack machine of Figure 13 and the standard 
machine are equivalent: 

P^aif and only bimp® ^ 

[elsmp"fc{V5}i ^ a if and only if‘f^\T^'^ for 

some k. 

Proof. Similar to the proof of Theorem 1 . □ 



4.3 Summary and Conclusion 

We have formalized and proven correct a stack machine for CPS programs with 
first-class continuations. This machine is idealized in that, e.g., it has no provision 
for stack overflow. Nevertheless, it embodies the most classical implementation 
strategy for first-class continuations: the stack is copied at call/cc time, i.e., 
in the CPS world, when a first-class continuation identifier is declared; and 
conversely, the stack is restored at throw time, i.e., in the CPS world, when 
a first-class continuation identifier is invoked. This design keeps second-class 
continuations costless - in fact it is a zero-overhead strategy in the sense of 
Clinger, Hartheimer, and Ost [4, Section 3.1]: only programs using first-class 
continuations pay for them. 

Furthermore, and as in Section 3, our representation of (p embodies its LIFO 
nature without committing to an actual representation. This representation can 
be retentive (in which case (p is implemented as a pointer into the heap) or de- 
structive (in which case ip is implemented as, e.g., a rewriteable array) [3] . In both 
cases, swap tp is implemented as copying ip. Copying the pointer yields captured 
continuations to be shared and copying the array yields multiple representations 
of captured continuations. 

5 A Segmented Stack Machine for First-Class 
Continuations 

Coroutines and threads are easily simulated using call/cc, but these simulations 
are allergic to representing control as a rewriteable array. Indeed for every switch 
this array is copied in the heap, yielding multiple copies to coexist without 
sharing, even though these copies are mostly identical. 

Against this backdrop, implementations such as PC Scheme [2] segment the 
stack, using the top segment as a stack cache: if this cache overflows, it is flushed 
to the heap and the computation starts afresh with an empty cache; and if it 
underflows, the last flushed cache is restored. Flushed caches are linked LIFO 
in the heap.® A segmented stack accomodates call/cc and throw very simply: at 
call/cc time, the cache is flushed to the heap and a pointer to it is retained; and 

® If the size of the stack cache is one, the segmented implementation coincides with a 
heap implementation. 
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at throw time, the flushed cache that is pointed to is restored. As for the bulk of 
the continuations, it is not copied but shared between captured continuations. 

It is simple to expand the stack machine of Section 4 into a segmented stack 
machine. One simply needs to define the judgment 

where Lp, e, and a are in Section 4 and <P denotes a LIFO list of ip’s. (One also 
needs an overflow predicate for (p.) 

Thus equipped, it is also simple to expand the stack substitution of Section 4, 
and to state and prove a simulation theorem similar to Theorem 2, thereby 
formalizing what Clinger, Hartheimer, and Ost name the “chunked-stack stra- 
tegy” [4]. Another moderate effort makes it possible to formalize the author’s 
incremental garbage collection of unshared continuations by one-bit reference 
counting [5]. One is also in position to formalize “one-shot continuations” [14]. 

Acknowledgments: I am grateful to Belmina Dzaflc and Frank Pfenning for our 
joint work, which forms the foundation of the present foray. Throughout, and 
as always, Andrzej Filinski has been a precious source of sensible comments and 
suggestions. This article has also benefited from the interest and comments of 
Lars R. Clausen, Daniel Damian, Bernd Grobauer, Niels O. Jensen, Julia L. 
Lawall, Lasse R. Nielsen, Morten Rhiger, and Zhe Yang. I am also grateful for 
the opportunity to have presented this work at Marktoberdorf, at the University 
of Tokyo, and at KAIST in the summer and in the fall of 1999. Finally, thanks 
are due to the anonymous referees for stressing the issue of retention vs. deletion. 

References 

1. Andrew W. Appel and David B. MacQueen. Standard ML of New Jersey. In 
Third International Symposium on Programming Language Implementation and 
Logic Programming, nnmber 528 in Lectnre Notes in Compnter Science, pages 1- 
13, Passan, Germany, August 1991. 

2. David B. Bartley and John C. Jensen. The implementation of PC Scheme. In 
Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, 
pages 86-93, Cambridge, Massachusetts, August 1986. 

3. Daniel M. Berry. Block structure: Retention or deletion? (extended abstract). In 
Conference Record of the Third Annual ACM Symposium on Theory of Computing, 
pages 86-100, Shaker Heights, Ohio, May 1971. 

4. William Clinger, Anne H. Hartheimer, and Eric M. Ost. Implementation strategies 
for first-class continuations. Higher-Order and Symbolic Computation, 12(l):7-45, 
1999. 

5. Olivier Danvy. Memory allocation and higher-order functions. In Proceedings of 
the ACM SIGPLAN’87 Symposium on Interpreters and Interpretive Techniques, 
SIGPLAN Notices, Vol. 22, No 7, pages 241-252, Saint-Paul, Minnesota, June 
1987. 

6. Olivier Danvy, Belmina Dzafic, and Frank Pfenning. On proving syntactic pro- 
perties of CPS programs. In Third International Workshop on Higher-Order Ope- 
rational Techniques in Semantics, volume 26 of Electronic Notes in Theoretical 
Computer Science, pages 19-31, Paris, France, September 1999. 




Formalizing Implementation Strategies for First-Class Continuations 



103 



7. Olivier Danvy and Andrzej Filinski. Representing control, a study of the CPS 
transformation. Mathematical Structures in Computer Science, 2(4):361-391, 1992. 

8. Olivier Danvy and Julia L. Lawall. Back to direct style II: First-class continuations. 
In Proceedings of the 1992 ACM Conference on Lisp and Functional Programming, 
LISP Pointers, Vol. V, No. 1, pages 299-310, San Francisco, California, June 1992. 

9. Olivier Danvy and Frank Pfenning. The occurrence of continuation parameters 
in CPS terms. Technical report CMU-CS-95-121, School of Computer Science, 
Carnegie Mellon University, Pittsburgh, Pennsylvania, February 1995. 

10. Bruce F. Duba, Robert Harper, and David B. MacQueen. Typing first-class con- 
tinuations in ML. In Proceedings of the Eighteenth Annual ACM Symposium on 
Prineiples of Programming Languages, pages 163-173, Orlando, Florida, January 
1991. 

11. Belmina Dzafic. Formalizing program transformations. Master’s thesis, DAIMI, 
Department of Computer Science, University of Aarhus, Aarhus, Denmark, De- 
cember 1998. 

12. Matthias Felleisen. The Calculi ofX-v-CS Conversion: A Syntactic Theory of Con- 
trol and State in Imperative Higher-Order Programming Languages. PhD thesis. 
Department of Computer Science, Indiana University, Bloomington, Indiana, Au- 
gust 1987. 

13. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence 
of compiling with continuations. In Proceedings of the ACM SIGPLAN’93 Confe- 
rence on Programming Languages Design and Implementation, SIGPLAN Notices, 
Vol. 28, No 6, pages 237-247, Albuquerque, New Mexico, June 1993. 

14. Christopher T. Haynes and Daniel P. Friedman. Embedding continuations in 
procedural objects. ACM Transactions on Programming Languages and Systems, 
9(4):582-598, 1987. 

15. Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing control in the 
presence of first-class continuations. In Proceedings of the ACM SIGPLAN’90 
Conference on Programming Languages Design and Implementation, SIGPLAN 
Notices, Vol. 25, No 6, pages 66-77, White Plains, New York, June 1990. 

16. Richard A. Kelsey and Jonathan A. Rees. A tractable Scheme implementation. 
Lisp and Symbolic Computation, 7(4):315-336, 1994. 

17. David Kranz, Richard Kesley, Jonathan Rees, Paul Hudak, Jonathan Philbin, and 
Norman Adams. Orbit: An optimizing compiler for Scheme. In Proceedings of 
the 1986 Symposium on Compiler Construction, SIGPLAN Notices, Vol. 21, No 7, 
pages 219-233, Palo Alto, California, June 1986. 

18. Peter J. Landin. The mechanical evaluation of expressions. Computer Journal, 
6:308-320, 1964. 

19. Drew McDermott. An efficient environment allocation scheme in an interpreter for 
a lexically-scoped Lisp. In Conference Record of the 1980 LISP Conference, pages 
154-162, Stanford, California, August 1980. 

20. Gordon D. Plotkin. Call-by-name, call-by-value and the A-calculus. Theoretical 
Computer Science, 1:125-159, 1975. 

21. Guy L. Steele Jr. Rabbit: A compiler for Scheme. Technical Report AI-TR-474, Ar- 
tificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, 
Massachusetts, May 1978. 

22. Christopher Strachey and Christopher P. Wadsworth. Continuations: A mathema- 
tical semantics for handling full jumps. Higher-Order and Symbolic Computation, 
13(1/2), 2000. Reprint of the technical monograph PRG-11, Oxford University 
Computing Laboratory (1974). 




Correctness of Java Card Method Lookup via Logical 

Relations 



Ewen Denney and Thomas Jensen 
Projet Lande, IRISA, Rennes Cedex 35042, France 



Abstract. We formalise the Java Card bytecode optimisation from class file to 
CAP file format as a set of constraints between the two formats, and define and 
prove its correctness. Java Card bytecode is formalised as an abstract operational 
semantics, which can then be instantiated into the two formats. The optimisation 
is given as a logical relation such that the instantiated semantics are observably 
equal. The proof has been automated using the Coq theorem proven 



Using a high-level language for programming embedded systems may require a 
transformation phase in order that the compiled code fits on the device. In this paper 
we describe a method for formally proving the correctness of such a transformation. 
The method makes extensive use of types to describe the various run-time structures 
and relies on the notion of logical relation to relate the two representations of the code. 
We present the method in the setting of mapping Java onto smart cards. The Java Card 
language [10] is a trimmed down dialect of Java aimed at programming smart cards. As 
with Java, Java Card is compiled into bytecode, which is then verified and executed on 
a virtual machine [4], installed on a chip on the card itself However, the memory and 
processor limitations of smart cards necessitate a further stage, in which the bytecode 
is optimised from the standard class file format of Java, to the CAP file format [11]. 
The core of this optimisation is a tokenisation in which names are replaced with tokens 
enabling a faster lookup of various entities. 

We describe a semantic framework for proving the correctness of Java Card toke- 
nisation. The basic idea is to give an abstract description of the constraints given in 
the official specification of the tokenisation and show that any transformation satisfying 
these constraints is ‘correct’. This is independent of showing that there actually exists 
a collection of functions satisfying these constraints. This article concentrates on pro- 
ving the correctness of the specification. The formal development of an algorithm is 
the subject of another report. The main advantage of decoupling ‘correctness’ into two 
steps is that we get a more general result: rather than proving the correctness of one 
particular algorithm, we are able to show that the constraints described in Sun’s official 
specification [11] (given certain assumptions) are sufficient. We give a formalisation and 
correctness proof for the part concerned with dynamic method lookup. A comprehensive 
formalisation appears as a technical report [2]. 

1 The Conversion 

Java source code is compiled on a class by class basis into the class file format. By 
contrast, Java Card CAP files correspond to packages. They are produced by the con- 
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version of a collection of class files. In the class file format, methods, fields and so on 
are referred to using strings. In CAP files, however, tokens are ascribed to the various 
entities. The idea is that if a method, say, is publically visible ' , then it is ascribed a token. 
If the method is only visible within its package, then it is referred to directly using an 
offset into the relevant data structure. Thus references are either internal or external. 
The conversion groups entities from different class files into the components of a CAP 
file. For example, all constant pools of the class files forming a package are merged into 
one constant pool component, and all method implementations are gathered in the same 
method component. One significant difference between the two formats is the way in 
which the method tables are arranged. In a class file, the methods item contains all the 
information relevant to methods defined in that class. In the CAP file, this information is 
shared between the class and method components. The method component contains the 
implementation details (i.e. the bytecode) for the methods defined in this package. The 
class component is a collection of class information structures. Each of these contains 
separate tables for the package and public methods, mapping method tokens to offsets 
into the method component. The method tables contain the information necessary for 
resolving any method call in that class. 

The conversion is presented in [11] as a collection of constraints on the CAP file, 
rather than as an explicit mapping between class and CAP formats. For example, if a class 
inherits a method from a superclass then the conversion can choose to include the method 
token in the relevant method table or, instead, that the table of the superclass should be 
searched. There is a choice, therefore, between copying all inherited methods, or having 
a more compressed table. The specification does not constrain this choice. We adopt 
a simplified definition of the conversion, only considering classes, constant pools, and 
methods (with inheritance and overwriting). In particular, we ignore fields, exceptions 
and interfaces. The conversion also includes a number of mandatory optimisations such 
as the inlining of final fields, and the type-based specialisation of instructions [10,11], 
which we do not treat here. 

2 Overview of Formalisation 

The conversion from class file to CAP format is a transformation between formats of two 
virtual machines. The first issue to be addressed is determining in what sense, exactly, 
the conversion to token format should be regarded as an equivalence. We cannot simply 
say that the JVM and JCVM have the same behaviour for all bytecodes, in class and 
CAP file format respectively, because, a priori, the states of the virtual machines are 
themselves in different formats. Instead, we adopt a simple form of equivalence based 
on the notion of representation independence [5]. This is expressed in terms of so-called 
observable types. This limits us to comparing the two interpretations in terms of words, 
but this is sufficient to observe the operand stack and local variables, where the results 
of execution are stored. 

Representation independence may be proven by defining coupling relations between 
the two formats that respect the tokenisation and are the identity at observable types. 

* We follow the terminology of [1 1], where a method is public visible if it has either a protected 
or a public modifier, and package visible if it is declared private or has no visibility modifier. 
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This can be seen as formalising a data refinement from class to CAP file. We formalise 
the relations nondeterministically as any family of relations that satisfies certain con- 
straints, rather than as explicit transformations. This is because there are many possible 
tokenisations and we wish to prove any reasonable optimisation correct. 

The virtual machines are formalised in an operational style, as transition relations 
over abstract machines. We adopt the action semantics formalism of Mosses [ 6 ], using 
a mixture of operational and denotational styles: the virtual machines are formalised 
operationally, parameterised with respect to a number of auxiliary functions, which are 
then interpreted denotationally. This modular presentation of the semantics facilitates 
the comparison between the two formats. We illustrate this for dynamic method lookup, 
used in the semantics of the method invocation instructions. The lookup function which 
searches for the implementation of a method is dependent on the layout of the method 
tables. The operational rule giving the semantics of the method invocation instructions, 
presented in Section 5, is parameterised with respect to the lookup function. Then in 
Section 6 two possible interpretations of lookup are given. 

In Section 4, we define abstract types for the various entities converted during 
tokenisation, which are common to the two formats. For example, Class_ref and 
Environment. It is this type structure which is used to define the logical relations. 
In Section 5 we give an operational semantics which is independent of the underlying 
class/CAP file format. The structure of the class/CAP file need not be visible to the 
operational semantics. We need only be able to extract certain data corresponding to 
a particular method, such as the appropriate constant pool. In Section 6 , we give the 
specific details of the class file and CAP file formats, defined as interpretations of types 
and auxiliary functions, |.]„ome and |.]tofc- We refer to these as the name and the token 
interpretation, respectively. 

In Section 7, we define the logical relation, {Re} Abstract j^ype- It is convenient to 
group the definition into several levels. First, there are various basic observable types 
(byte, short, etc.), 7 , for which we have R.y = id^. Second, there are the references, t, 
such as package and class references, for which the relation represents the tokenisa- 
tion of named items. Third, the constraints on the organisation into components (which 
we will call the componentisation) are expressed in i?^, where k includes method infor- 
mation structures, constant pools, and so on. This represents the relationship between 
components in CAP files and the corresponding entities in class files. Using the above 
three families of relations we can define Re for each type, 9, where 

9 -.— -i \ L \ K \ 9 x9' \ 9 ^ O' \ 9 + 9' \ 9*. 

The family of relations, {Re}e g Abstract.type, represents the overall construction of 
components in the CAP file format from a class file. The relations are ‘logical’ in the 
sense that the definitions for defined types follow automatically. For example, we define 
the type of the environment that contains the class hierarchy as 

Environment = Package_ref — ^ Package. 

and so the definition of i?Environment follows from those of i?Package_ref, f?Package and 
the standard construction of i? ; similarly for i?Heap- 
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3 Related Work 

There have been a number of formalisations of the Java Virtual Maehine whieh have 
some relevanee for our work here on Java Card. Bertelsen [1] gives an operational 
semanties whieh we have used as a starting point. He also eonsiders the verifieation 
eonditions, whieh eonsiderably eomplieates the rules, however. Puseh has formalised 
the JVM in HOL [8]. Like us, she eonsiders the elass file to be well-formed so that the 
hypotheses of rules are just assignments. The operational semanties is presented direetly 
as a formalisation in HOL, whereas we have ehosen (equivalently) to use inferenee 
rules. All these works make various simplifieations and abstraetions. However, sinee 
these are formalisations of Java rather than Java Card they do not eonsider the CAP file 
format. In eontrast, the work of Lanet and Requet [3] is speeifieally eoneemed with Java 
Card. They also aim to prove the eorreetness of Java Card tokenisation. Their work ean 
be seen as eomplementing ours. They eoneentrate on optimisations, ineluding the type 
speeialisation of instruetions, and do not eonsider the conversion as such. In contrast, we 
have specified the conversion but ignored the subsequent optimisations. Their formalism 
is based on the B method, so the specification and proof are presented as a series of 
refinements. In [7], Puseh proves the correctness of an implementation of Prolog on an 
abstract machine, the WAM. The proof structure is similar to ours, although there are 
refinements through several levels. There are operational semantics for each level, and 
correctness in expressed in terms of equivalence between levels. The differences between 
the semantics are significant, since they are not factored out into auxiliary functions as 
here. She uses a big-step operational semantics, which is not appropriate for us because 
we wish to compare intermediate results. Moreover, she uses an abstraction function on 
the initial state, the results being required to be identical, whereas we have a relation for 
both initial and final states. 



4 Abstract Types 

We use types to structure the transformation. These are not the types of the Java Card 
language, but rather are based on the simply -typed lambda calculus with sums, products 
and lists. We use record types with the actual types of fields (drawn from the official 
specification where not too confusing) serving as labels. Occasionally we use terms 
as singleton types, such as DxFFFF and 0. There are two sorts of types: abstract and 
concrete. The idea is that abstract types are those we can think of independently of a 
particular format. The concrete types are the particular realisations of these, as well as 
types which only make sense in one particular model. For example, CP_index is the 
abstract type of indices into a constant pool for a given package. In the name inter- 
pretation, this is modelled by a class name and an index into the constant pool of the 
corresponding class file, i.e. Classjname x Index where Index is a concrete type. 
In the token interpretation, however, since all the constant pools are merged, we have 
|CP_index]tofe = Package_tok x Index. Another example is the various distinctions 
that are made between method and field references in CAP files, but not class files, 
and which are not relevant at the level of the operational semantics, which concerns 
terms of abstract types. We arrange the types so that as much as possible is common 
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between the two formats. For example, it is more eonvenient to uniformly define envi- 
ronments as mappings of type Package_ref — ^ Package, with Package interpreted as 
Classmamie — > Class_f ile or CAP_f ile. 

There is a ‘type of types’ for the two forms of data type in Java Card — primitive 
types, i.e., the simple types supported direetly on the eard, and referenee types. 

Type = {BooleEm, Byte, Short} + Ref erence_type. 

Ref erence_type = Array _type + Class_ref . 

We use a separate type, Ob j ect_ref , to refer to objeets on the heap. The objects themsel- 
ves contain a reference to the appropriate class or array of which they form an instance. 

The type Word is an abstract unit of storage and is platform specific. All we need 
know is thaf objecf references and the basic types. Byte, Short and Boolean, can be 
stored in a Word. Rather than use an explicit coercion, we assume 

Word — Dbject_ref + Null + Boolectn + Byte -f Short. 

Thus a word is (i.e. represents) either a reference (possibly null) or an element of a 
primitive type. Furthermore, we define Value = Word. Although this is not strictly 
necessary, there is a conceptual distinction. If we were to introduce values of type int, 
then a value could be either a word or a double word. 

There are several forms of references used during tokenisation, viz., Package_ref , 
Class_ref and Method_ref . We distinguish Package from Package_ref , and simi- 
larly for the other items. Note that a reference is a composite entity which can be context 
dependent (e.g. in the CAP format a class reference can be in internal or external forms). 
We assume, however, that sufficient information is given so that references make sense 
globally. For example, class names are fully qualified, and class tokens are paired wifh 
a package token. We fake field and method references to be to particular members of 
some class, and so contain a class reference. In contrast, an identifier is a name or a 
token (these are not used at the abstract level though). Using these basic types, we can 
then construct complex types using the usual type constructors: (non-dependent) sum, 
product, function and list types (denoted 9*) as we did when defining the environment 
at the end of Sect. 2. 

5 Operational Semantics 

We define an operafional semantics framework that allows us to model the execution 
of both class and CAP files. This is obfained by parameferising the semantics on a 
number of auxiliary functions that embody the differences between the two formats. 
This factorisation of the semantics reduces the equivalence proof considerably. 

The official specification of the JCVM (and JVM) is given in terms of frames. A 
frame represents the state of the current method invocation, together with any other 
useful data. We introduce the notion of configuration, consisting of (the abstract syntax 
of) fhe code of the current method still to be executed, the operand stack, the local 
variables, and the current class reference. We write these as Conf ig (b, o, I, c) or just 
(b,o,l,c). To account for method invocations, we allow a configuration itself to be 
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considered as an instruction. When a method is invoked, the current instruetion becomes 
a new configuration. Instead of a stack of frames, then, we have a single piece of ‘code’ 
(in this general sense). This form of closure is more general than the traditional idea of a 
call stack but helps simplify the proof Method invocation is modelled by replacing the 
invoking instruction with a configuration that contains the code of the invoked method 
(see the detailed description of invokevirtual below). Execution of a method body is 
modelled by allowing transitions inside a configuration. 

f^f 

(Config f,ops,l,c) ^ (Config f',ops,l,c) 

The method invocation instructions (and others) take an argument which is an index 
into either the constant pool of a class file, or into the constant pool component of a CAP 
file. This means that the ‘concrete’ bytecode is itself dependent on the implementation 
and is therefore modelled by an abstract type. Formally, we define a transition relation 

C Config X Arrow x Config 

using the types 

Config = Bytecode x Word* x Locals x Class_ref 
Arrow = Global_state — > Global_state 
Global_state = Environment x Heap 

Bytecode = Instruction + (Bytecode x Bytecode) + Config 

As mentioned above, the structure of the class/CAP file need not be visible to the 
operational semantics. We use a number of auxiliary functions, some of which have 
preconditions that we take as concomitant with the well-formedness of the class file. 
The definition of method invocation uses the lookup function 

lookup : Class_ref x Method_ref — > Class_ref x Bytecode 

that takes the actual class reference, together with the declared method reference (which 
contains the class where the method is declared), and returns the class reference where 
the method is defined together with the code. Function methodjnargs : Method_ref 
Nat returns the number of arguments for a given method reference. The instruction for 
virtual method invocation is evaluated as follows: 

1 . The two byte index, i, into the constant pool is resolved to get the declared method 
reference containing the declared class reference and a method identifier (either a 
signature or token). 

2. The number of arguments to the method is calculated. 

3. The object reference, r, is popped off the operand stack. 

4. Using the heap, we get heap{r) — {act-cref, _), the actual class reference (fully 
qualified name or a package/class token pair). 

5. We then do lookup(act_cre/, dec-Uiref), getting the class where the method is 
implemented, and its bytecode. The lookup function is used with respect to the class 
hierarchy (environment). 

6. A configuration is created for this method and evaluation proceeds from there. 




110 



E. Denney and T. Jensen 



decjmref := constant_pool (c)(i) get declared method reference 

n := method_nargs(stat_mre/) get number of arguments 

{act-cref, _) := heap{r) get actual class reference from heap 

{m-cl,m-cd) := lookup(act_cre/, dec_mre/) look up method 

(invokevirtual oi . . . a„ :: r :: s, /, c) {{rri-cd, (), ai . . . a„ :: r, m_d), s, I, c) 

In the following sections we show how to instantiate the semantic framework (in parti- 
cular the lookup function) to obtain a class file and a CAP file semantics. 

6 Interpretations 

The name interpretation gives semantics using Java class files (see Figure 1). Since this 
is fairly standard we give a brief description. Classes are described by fully qualified 
names, whereas methods and fields are given signatures, consisting of an unqualified 
name and a type, together with the class of definition. We assume a function pack-name 
which gives the package name of a class name. The data is arranged into class files, each 
of which contains all the information corresponding to a particular class. We only give 
the interpretation of those parts used here. We group the class files by package into a 
global environment so envjname{p) (c) denotes the class file in package p with name c. 



|Package]]name = Classjiame — > Class_file 
|Class_ref]„ame = Classjiame 
|Method_ref]„o„ie = Classjiame X Sig 
Sig = Method_namex[Type]*a„,e 
|Class]„ome = Class_file 

Class_file = Class_flags X Super X Methods_item X Constant_pool_item X Classjiame 

Super = Class_name -|- Void 
[Pack jnethods] name = Classjiame — > Methods_item 
Methods_item = Sig — >■ Method_info 

Method_info = 

Method_flags X SigX (|Type]„ame -|- Void) X Maxstack X Maxlocals X Bytecode 

Fig. 1. Name Interpretation 



Method signatures are not considered to include the return type. We assume that the 
signature in the result of a methods item is the same as the argument. 

There are a number of possibilities for how method lookup should be defined, de- 
pending on the definition of inheritance. For example, [1,8] use a ‘naive’ lookup which 
does not take account of visibility modifiers. A fuller discussion of this appears in [9]. 

In the JCVM, data is arranged by packages into CAP files. Each CAP file consists 
of a number of components, but not all are used for method lookup (or, indeed, the rest 
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lookup_name (act_class, (sig, dec_class)) = 
let dec_pk = pack_name (dec_class) 
act_pk = pack_name(act_class) 

(_)_>nie'th_dec,_,_) = env_name (dec_pk) (dec_class) 

(_, super ,meth_act = env_name (act_pk) (act_class) 
(dec_f lags, = meth_dec(sig) in 
if meth_act (sig) = undefined 
then lookup_name (super , (sig, dec_class)) 
else if 

dec_f lags (protected) or dec_f lags (public) or act_pk = act_pk 
then let code) = meth_act (sig) in (act_class , code) 

else lookup_name (super , (sig, dec_class)) 

Fig. 2. The lookup function for the class file format. 



of the operational semantics). We just include those components we need here, namely, 
the constant pool, class and method components. 

References to items external to a package are via tokens — for packages, classes, 
and virtual methods — each with a particular range and scope. These are then used to 
find internal offsets into the components. For example, a class reference is either an 
internal offset into the class component of the CAP file of the class’ package, or an 
external reference composed of a package token and a class token. However, since we 
need to relate the reference to class names, we will assume that all references come with 
package information, even though this is superfluous in the case of internal references. 



|Package]tofc = CAP_file 

CAP -file = Constant-pool-comp X Class_comp X Method_comp 
[Package_ref Jtofc = Package_tok 
|Class_ref]tofc = Package_tok X (Class_tok + Offset) 
[Method_ref]toj; = [Class_ref]tok X Virtual_method_tok 
|Class|tofc = Class-info 
Class-Comp = Offset —> Class_info 

Class-info = Class_flags X Super X Public_table X Package-table X Class-ref 
Public-table = Public-base X Public-size X (Index — Offset + {OxFFFF}) 
Package-table = Package-base X Package-size X (Index — >■ Offset) 
|Pack_methods] tofc = Method_comp 
Method-Comp = Offset — >■ Method-info 
Method-info = Method-flags X Maxstack X Nargs X Max-locals X Bytecode 



Fig. 3. Token Interpretation 
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The class component consists of a list of class information structures, each of which 
has method tables, giving offsets into the method component, where the method imple- 
mentations are found. The lookup algorithm uses tokens to calculate the corresponding 
method table index. There are separate tables for public and package methods. Method 
access information is given implicitly by the tokens rather than by flags. The two method 
tables each contain a base, size and ‘list’ of entries. The entries are defined from the base 
to base + size — 1 inclusive. The entry for a public method will be OxFFFF ifthe method 
is defined in another package. 

For a given class reference, the function class_inf o finds the corresponding class 
information structure in the global environment. The variant, class_inf o' returns the 
class information structure in a particular CAP file. The function method_array sim- 
ply finds the method component for a given class reference. We assume the existence 
of functions class.off set and method.off set for resolving external tokens to in- 
ternal offsets It follows from the definition of the abstract type Environment, that the 
environment in the token format consists of a mapping from package tokens to their 
corresponding CAP file i.e., envtok ■ Package_tok — J CAP_file. The lookup function 
takes a class reference (the declared class), a method reference (in the actual class), and 
returns the reference to the class where the code is defined, together with the bytecode 
itself The main steps of the algorithm (see Fig. 4) are: 

1 . Get method array for the package of the actual class. 

2. Get class information for the actual class. 

3. If public: if defined then get info else lookup super. 

If package: if defined and visible then get info else lookup super. 

1 Formalisation of Equivalence 

We formalise the equivalence between the class and CAP formats as a family of relations, 
{Re : {B\narae ^ lB\tok\0(^Abstract_type indexed by abstract type, 9. The idea is that 
X Re y when y is a possible transformation of x. The relations are not necessarily total, 
i.e. for some x : |0]name, there may not be a y such that x Re y. Formally, the relations 
are defined as a mufually inductive collection of constraints. Re, for each type 9, where 
the types, 9, are given by the grammar: 

7 ::= Bool | Nat | Object_ref | Boolean | Byte | Short | Value | Word 

L ::= Package_ref | Ext_class_ref | Class_ref | Method_ref 

K ::= CP_index | CP_info | Method_info | Package | Class | 
ConstcUit_pool I Packunethods 

9-.-.= -f \ i \ K \ 9x9' \ 9^9' \ 9 + 9' \ 9* 

where the observable types are built up inductively from the 7, i.e. do not contain the t 
and K. There are two sources ofunderspecification. First, the relations really can be non- 
functional. Second, there is a choice for what some of the relations are. For example, 
f?ciass_ref is some bijcction satisfying certain constraints. The relations between the 
‘large’ structures, however, are completely defined in ferms of those between smaller 
ones. There are two parts to the transformation itself: the tokenisation, defined as the 
relations R^, and the ‘componentisation’, defined as the 
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lookup_tok (act_class_ref , (dec_class_ref , method_tok) ) = 

let methods = method_array (act_class_ref ) 

(_, super, (public_base,_, public_table) , 

(package_base , _ , package_table) , _) : Class_info = 
class_info(act_class_ref ) in 
if method_tok div 128 = 0 then /* public */ 
if method_tok >= public_base then 
let method_of f set = public_table [method_tok-public_base] in 
if method_of f set <> OxFFFF 

then (act_class_ref , methods [method_off set] .Bytecode) 
else /* look in superclass */ 

lookup_tok(super , (dec_class_ref , method_tok)) 
else /* look in superclass */ 

lookup_tok(super , (dec_class_ref , method_tok)) 
else /* package */ 

if method_tok >= package_base /\ 

same_package (dec_class_ref , act_class_ref ) 
then let method_of f set = 

package_table [method_tok mod 128 - package_base] 
in (act_class_ref , methods [method_off set] .Bytecode) 
else /* look in superclass */ 

lookup_tok(super , (dec_class_ref , method_tok)) 

Fig. 4. The lookup function for the CAP file format. 



7.1 Tokenisation 

The relations, R^, represent the tokenisation of items. The general idea is to set up 
relations between the names and tokens assigned to the various entities, subject to certain 
constraints described in the specification. 

In order to account for token scope, we relate names to tokens paired with the 
appropriate context information. For example, method tokens are scoped within a class, 
so the relation i?Method_ref is between pairs of class names and signatures, and pairs of 
class references and method tokens. We must add a condition, therefore, to ensure that 
the package token corresponds to the package name of this class name. 

We assume that each of these relations is abijection, modulo the equivalence between 
internal and external references (with one exception to account for the copying of virtual 
methods, explained below). Formally, 

a Rb A a' Rb a = a' 

aRb {aRb' <1=^ Equiv(6, 6')) 

where equivalence, Equiv, of class references is defined as the reflexive symmetric 
closure of: 



Equiv((p_tofc, offset), {pJ.ok,cJok)) 



class_off set(p_tofc, cTo/c) = offset 
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The second condition contains two parts: that the relation is functional modulo Equiv, 
and that it is closed under Equiv. We say that R is an external bijection when these 
conditions hold. We extend the definition of Equiv and external bijection to the other 
references. 

These relations are defined with respect to the environment (in name format). We use 
a number of abbreviations for extracting information from the environment. We write 
c < d for the subclass relation {i.e. the transitive closure of the direct subclass relation) 
and < for its reflexive closure. In the token interpretation this is modulo Equiv. We write 
mJok G cjref when a method with token mJok is declared in the class with reference 
cjref, and packjname{c) for the package name of the class named c. 

We define function Class_flag for checking the presence of attributes such as 
public, final, etc. The tokenisation uses the notion of external visibility. 

Externally _visible(c_name) = Class jfla.g(cjname, Public) 

We will also write public{sig) and package(sig) according to the visibility of a method. 

Package_ref : As mentioned above, we take package tokens to be externally visi- 
ble. The relation i?Package jref is simply defined as any bijection between package names 
and tokens. 

Ext_class_ref : In order to define the relation for class references we first define 
the relation for external class references. We define i?Ext_ciass_ref as a bijection between 
class names and external class references such that: 

C-name i?Ext_class_ref {p-tok,C-tok) 

Externally _visible(c_name) A packjname{cjname) f?package_ref P-tok 

Method_ref : This is not a bijection because of the possibility of copying. Alt- 

hough ‘from names to tokens’ we do have: 

{c-uame, sig) i?Method_ref {cjref ,mJ.ok) A J cjname = c' .name A 
{o' .name, si g') i?Method_rer {c.ref ,m.tok) \ = sig' 

for a converse we have: 

{c.name, sig) i?Method_ref {c.ref ,m.tok) A J {c.ref < c' .ref V c' .ref < c.ref) 

{c.name, sig) i?Method_ref {c'.ref, m'.tok) \ A m.tok = m'.tok 

The first condition says that if a method overrides a method implemented in a superclass, 
then it gets the same token. Restrictions on the language mean that overriding cannot 
change the method modifier from public to package or vice versa. 

{c.name, sig) i?Method_ref {c.ref , m.tok) A 'I 

{d. name, sig) RMethod.rei {c'.ref ,m'.tok) A I , , , 

d .name < c.name A I 

{package(sig) same.package{c.name,d .name)) J 

The second condition says that the tokens for public introduced methods must have 
higher token numbers that those in the superclass. We assume a predicate, newjnethod, 
which holds of a method signature and class name when the method is defined in the 
class, but not in any superclass. 

public{sig) A new .method{sig , c.name) A ^ fdm' .tok G super {c. ref) . 
{c.name, sig) i?Method_ref {c.ref , m.tok) I m.tok > m'.tok 
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Package-visible tokens for introduced methods are similarly numbered, if the superclass 
is in the same package. 

package(sig) A newjnethod{sig, cjname) A w , , 

/ -to / f .A ; t A ^ (vm Jok G super(c-ref) . 

{c-name,sig) Rnethodjret {c-ref,m-tok) A ^ \ \ ^ , 

1 / i \\ *• mJok > m Jok 

samejpackage[cjname, super {cjname)) 

The third condition says that public tokens are in the range 0 to 127, and package tokens 
in the range 128 to 255. 

{cjname, sig) i?Method_ref {c.ref ,mJ.ok) ^ 

{public{sig) 0 < mJok < 127) A {package{sig) 128 < mJok < 255) 
The specification [11] also says that tokens must be contiguously numbered starting 
at 0 but we will not enforce this. 



7.2 Componentisation 

The relations in the previous section formalise the correspondence between named and 
tokenised entities. When creating the CAP file components, all the entities are conver- 
ted, including the package visible ones. Thus at this point we define i?ciass_ref as a 
relation between named items and either external tokens or internal references, subject 
to coherence constraints. 

We must ensure that if a name corresponds to both an external token and to an internal 
offset, then the token and the offset correspond to the same entity. We ensure this by 
using the offset function class_offset : Package_tok x Class_tok — ^ Offset 
which returns the internal offset corresponding to an external token, and then define 
f?ciass_ref from this and i?Ext_ciass_ref ■ Clearly, therefore, i?ciass_ref is not a bijection. 

Class_ref : We define i?ciass_ref as an external bijection which respects 

f?Ext_ciass_ref, that is, such that 

c_name i?ciass_ref {p-tok,cJok) c_name i?Ext_ciass_ref {p-tok,cjtok). 

Thus i?ciass_ref cxtcuds i?Ext_ciass_ref to internal references. 

Method_inf o: We only treat certain parts of the method information here: 

flags 77Method_f lags fl^g^ A 
maxstack = max stack' A 
size(sig) = nargs'A 
maxlocals = maxlocals' A 

code f^Bytecode Codc 

In the name interpretation, information is grouped by the package and so, for example, 
|Pack_methods]„ame : Classjname — ^ Methods.item is the ‘set’ of method data 
for all classes. In the token format the method information is spread between the two 
components. The coupling relations reflect this: the relation i?ciass ensures that a named 
method corresponds to a particular offset, and i?Pack_methods ensures that the entry at this 
offset is related by i?Method_info- 



{flags, sig, maxstack, maxlocals, code, _) 

f^Method_inf o 

{flags' , maxstack' , nargs' , maxlocals' , code') 
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Pack_methods: The method item and method component contain the implementa- 
tions of both static and virtual methods. 

methods -name V(c_name, sz(/) i?Method_ref {p-tok,cJok,m-tok). 

f^Pack_methods methods jriame(^C-Tiame ^ f^Method_info 

method-Comp methods -Comp. methods{meth.odL-oi f set {pJ.ok, cJok, mdok) 

Class : We define i?ciass ■ There are a number of equivalences expressing correctness 
of the construction of the class component. For the lookup, the significant ones are those 
between the method tables. These say that if a method is defined in the name format, 
then it must be defined (and equivalent) in the token format. Since the converse is not 
required, this means we can copy method tokens from a superclass. Instead, there is 
a condition saying that if there is a method token, then there must be a corresponding 
signature in some superclass. 

If a method is visible in a class, then there must be an entry in the method table, 
indicating how to find the method information structure in the appropriate method com- 
ponent. For package visible methods this implies that the method must be in the same 
package. For public methods, if the two classes are in the same package, then this entry 
is an offset into the method component of this package. Otherwise, the entry is OxFFFF, 
indicating that we must use the method token to look in another package. 

The class component only contains part of the information contained in the class 
files. The full definition is given in Figure 5. (writing cmame for cf .Class jfiame and 
c_re/ for ci.Class-ref): The offset functions link the various relations. We make a 
global assumption (in fact, local to an environment) of the existence of class_of f set 
and method_of f set. 

Equivalence proof: The full proof establishes that the auxiliary functions preserve the 
appropriate relations [2]. Here, we state the main lemma for the function lookup whose 
type is Class_ref x Method_ref Class_ref x Bytecode. 

Lemma 1. If the heap and environment are related in the two formats, then: 

l/ooAj'UpJj^ame f^Class_ref xMethod_ref— ^Class_ref X Byte code 

In order to use the operational semantics with the logical relations approach it is 
convenient to view the operational semantics as giving an interpretation. We define 
|co<ie]((enu, heap, opstack, loc-vars, m-ref)) as the resulting state from the (unique) 
transition from {code, opstack, locjuars) with environment env and heap heap. Thus 
interpreted bytecode has type State Bytecode x State where State is 

State = Global_state x Operand_stack x Local_variables x Class_ref 

Now, the following fact is trivial to show: if Rb = ids for all basic observable types, 
then Rs = idg for all observable 6. In combination with the following theorem, then, this 
says that if a transformation satisfies certain constraints (formally expressed by saying 
that it is contained in R) then it is correct, in the sense that no difference can be observed 
in the two semantics. In particular, we can observe the operand stack (of observable type 
Word*) and the local variables (of observable type Nat — > Word) so these are identical 
under the two formats. 
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cf : Class_file -Rciass ci : Class_info 
' cf .Class -flags -Rciass_fiags ci. Class-flags A 
cf .Super J?ciass_ref ci.Supcr A 

Vsig € cf.Methods-item. 
public{sig) 

3m-tok . ci.Public-hase < rri-tok < ci.Public-base + ci.Publicsize A 
{c-name, sig) -RMethod_ref {c-ref,m,-tok) A 

ci.Public-table\m-tok — ci.Public-base] — method_off set(c_re/, 

A 

package(sig) 

3m-tok . ci.Packagc-base < rti-tok & 127 < ci.PackageJbase + ci.Packagesize A 
{c-name, sig) -RMethod_ref (c-ref,m,-tok) A 

ci.Package-table[m-tok & 127 — ci. Package-base] = method_off set(c_re/, rti-tok) 
A 

\/m-tok G ci. Public -table U ci.Package-table.3sig.3c' -name. 

{c' -name, sig) Rnethod-ref {c-ref, mctok) A C-uame < d -name A 
public(sig) ^ [ {same-package{c-name,c' -name) 

ci.Public-table[m-tok — ci.Public-base] OxFFFF)] 



Fig. 5. Definition of -Rciass 



XhCOrCHl 1. Assume that enVname -^Environment t^nVtoky heaPfiame -^Heap heaptokt 
Is -RLocal_state Codc -RBytecode Codc'. X/teM 

, /s) -RBytecode X State |cO(i6 J to/c , HcCLPiq]^, Is ) 

8 Conclusion 

We have formalised the virtual machines and file formats for Java and Java Card, and 
the optimisation as a relation between the two. Correctness of this optimisation was 
expressed in terms of observable equivalence of the operational semantics, and this was 
deduced from the constraints that define the optimisation. Although the framework we 
have presented is quite general, the proof is specific to the instantiations of auxiliary 
functions we chose. It could be argued that we might have proven the equivalence of 
two incorrect implementations of lookup. The remedy for this would be to specify the 
functions themselves, and independently prove their correctness. Furthermore, we have 
made a number of simplifications which could be relaxed. We have used a simple 
definition of RBytecode here, which just accounts for the changing indexes into constant 
pools (as well as method references in configurations). We have not considered inlining 
or the specialisation of instructions, however. We expressed equivalence in terms of an 
identity at observable types but we should also account for the difference in word size, 
as in [3]. Although the specialisation of instructions could be handled by our technique, 
the extension is less clear for the more non-local optimisations. 

We emphasise that the particular form of operational semantics used here is orthogo- 
nal to the rest of the proof This version suffices for the instructions considered here, but 
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could easily be changed (along with the definition of i?Bytecode)- The auxiliary functions 
could be given different definitions; for example, an abstract interpretation or, going in 
the opposite direction, including error information. 

The definitions have been formalised in Coq, and the lemmas verified [9]. The di- 
scipline this imposed on the work presented here was very helpful in revealing errors. 
Even just getting the definitions to type-check uncovered many errors. We take the com- 
plexity of the proofs (in Coq) as evidence for the merit in separating the correctness of 
a particular algorithm from the correctness of the specification. In fact, the operational 
semantics, correctness of the specification, and development of the algorithm are all 
largely independent of each other. 

As mentioned in the introduction, there are two main steps to showing correctness: 

1 . Give an abstract characterisation of all possible transformations and show that the 
abstract properties guarantee correctness. 

2. Show that an algorithm implementing such a transformation exists. 

We are currently working on a formal development of a tokenisation algorithm using 
Coq’s program extraction mechanism together with constraint-solving tactics. 
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Abstract. We exhibit a technique for automatically verifying the safety 
of simple C programs working on tree-shaped data structures. We do 
not consider the complete behavior of programs, but only attempt to 
verify that they respect the shape and integrity of the store. A verified 
program is guaranteed to preserve the tree-shapes of data structures, to 
avoid pointer errors such as NULL dereferences, leaking memory, and 
dangling references, and furthermore to satisfy assertions specified in a 
specialized store logic. 

A program is transformed into a single formula in WSRT, an extension of 
WS2S that is decided by the MONA tool. This technique is complete for 
loop-free code, but for loops and recursive functions we rely on Hoare- 
style invariants. A default well-formedness invariant is supplied and can 
be strengthened as needed by programmer annotations. If a program fails 
to verify, a counterexample in the form of an initial store that leads to 
an error is automatically generated. 

This extends previous work that uses a similar technique to verify a 
simpler syntax manipulating only list structures. In that case, programs 
are translated into WSIS formulas. A naive generalization to recursive 
data-types determines an encoding in WS2S that leads to infeasible com- 
putations. To obtain a working tool, we have extended MONA to directly 
support recursive structures using an encoding that provides a neces- 
sary state-space factorization. This extension of MONA defines the new 
WSRT logic together with its decision procedure. 



1 Introduction 

Catching pointer errors in programs is a difficult task that has inspired many 
assisting tools. Traditionally, these come in three flavors. First, tools such a Pu- 
rify [3] and Insured— I- [17] instrument the generated code to monitor the runtime 
behavior thus indicating errors and their sources. Second, traditional compiler 
technologies such as program slicing [21], pointer analysis [7], and shape analy- 
sis [19] are used in tools like CodeSurfer [8] and Aspect [10] that conservatively 
detect known causes of errors. Third, full-scale program verification is attempted 
by tools like LCLint [6] and ESC [5] , which capture runtime behavior as formulas 
and then appeal to general theorem provers. 

All three approaches lead to tools that are either incomplete or unsound (or 
both), even for straight-line code. In practice, this may be perfectly acceptable 
if a significant number of real errors are caught. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 119-134, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 
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In previous work [11], we suggest a different balance point by using a less 
expressive program logic for which Hoare triples on loop-free code is decidable 
when integer arithmetic is ignored. That work is restricted by allowing only 
a while-language working on linear lists. In the present paper we extend our 
approach by allowing recursive functions working on recursive data-types. This 
generalization is conceptually simple but technically challenging, since programs 
must now be encoded in WS2S rather than the simpler WSIS. Decision procedu- 
res for both logics are provided by the MONA tool [13, 18] on which we rely, but 
a naive generalization of the previous encoding leads to infeasible computations. 
We have responded by extending MONA to directly support a logic of recursive 
data-types, which we call WSRT. This logic is encoded in WS2S in a manner 
that exploits the internal representation of MONA automata to obtain a much 
needed state-space factorization. 

Our resulting tool catches all pointer errors, including NULL dereferences, 
leaking memory, and dangling references. It can also verify assertions provided 
by the programmer in a special store logic. The tool is sound and complete for 
loop-free code including if-statements with restricted conditions: it will reject 
exactly the code that may cause errors or violate assertions when executed in 
some initial store. For while-loops or functions, the tool relies on annotations in 
the form of invariants and pre- and post-conditions. In this general case, our tool 
is sound but incomplete: safe programs exist that cannot be verified regardless 
of the annotations provided. In practical terms, we provide default annotations 
that in many cases enable verification. 

Our implementation is reasonably efficient, but can only handle programs of 
moderate sizes, such as individual operations of data-types. If a program fails 
to verify, a counterexample is provided in the form of an initial store leading 
to an error. A special simulator is supplied that can trace the execution of a 
program and provide graphical snapshots of the store. Thus, a reasonable form 
of compile-time debugging is made available. While we do not detect all program 
errors, the verification provided serves as a finely masked filter for most bugs. 

As an example, consider the following recursive data-type of binary trees 
with red, green, or blue nodes: 

struct RGB { 

enum {red, green, blue} color; 
struct RGB *left; 
struct RGB *right ; 

}; 



The following non-trivial application collects all green leaves into a right-linear 
tree and changes all the blue nodes to become red: 

/**data**/ struct RGB *tree; 

/**data**/ struct RGB *greens; 

enum bool {false, true}; 

enum bool greenleaf (struct RGB *t) { 
if (t==0) return false; 
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if (t->color ! =green) return false; 
if (t->left!=0 II t->right ! =0) return false; 
return true ; 

} 

void traverse (struct RGB *t) { 
struct RGB *x ; 
if (t!=0) { 

if (t->color==blue) t->color = red; 

if (greenleaf (t->left)==true /**keep: t ! =0 **/) { 
t->left->right = greens; 
greens = t->left; 
t->left=0; 

} 

if (greenleaf (t->right)==true /**keep: t!=0 **/) •[ 
t->right->right = greens ; 
greens = t->right; 
t->right=0; 

} 

traverse (t->left) ; /**keep: t ! =0 **/ 

traverse (t->right ) ; /**keep: t!=0 **/ 

} 

} 



/**pre: greens==0 **/ 
mainO { traverse (tree) ; } 

The special comments are assertions that the programmer must insert to specify 
the intended model (/**data**/), restrict the set of stores under consideration 
(/**pre**/), or aid the verifier (/**keep**/). They are explained further in 
Section 2.4. 

Without additional annotations, our tool can verify this program (in 33 se- 
conds on a 266MHz Pentium II PC with 128 MB RAM). This means that no 
pointer errors occur during execution from any initial store. Furthermore, both 
tree and greens are known to remain well-formed trees. Using the assertion: 

all p: greens (->left + ->right)*==p => (p!=0 => p->color==green) 

we can verify (in 74 seconds) that greens after execution contains only green 
nodes. That greens is right-linear is expressed through the assertion: 

all p: greens (->left + ->right) *==p => (p!=0 => p->left==0) 

In contrast, if we assert that greens ends up empty, the tool responds with a 
minimal counterexample in the form of an initial store in which tree contains a 
green leaf. 

An example of the simulator used in conjunction with counterexamples comes 
from the following fragment of an implementation of red-black search trees. 
Consider the following program, which performs a left rotation of a node n with 
parent p in such a tree: 
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struct Node { 

enum {red, black} color; 
struct Node *left; 
struct Node *right ; 

}; 

/**data**/ struct Node *root; 

/**pre: n!=0 & n->right ! =0 & 

(p!=0 => (p->left==n I p->right==n) ) & 

(p==0 => n==root) **/ 

void left_rotate (struct Node *n, struct Node *p) { 
struct Node *t ; 
t = n->right ; 
n->right = t->left; 
if (n==root) root = t; 
else if (p->left==n) p->left = t; 
else p->right = t; 
t->left = n; 

} 

In our assertion language, we cannot express the part of the red-black data-type 
invariant that each path from the root to a leaf must contain the same number 
of black nodes; however we can capture the part that the root is black and that 
a red node cannot have red children: 

root->color==black & 

all p: p->color==red => 

(p->left->color ! =red & p->right->color ! =red) 

If we add the above assertion as a data-type invariant, we are (in 18 seconds) 
given a counterexample. If we apply the simulator, we see the following example 
run, which shows that we have forgotten to consider that the root may become 
red (in which case we should add a line of code coloring it black): 



p n root t P root t P n root t 




Such detailed feedback at compile-time is clearly a useful debugging tool. 
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2 The Language 

The language in consideration is a simple yet non-trivial subset of C. It allows 
declaration of tree-shaped recursively typed data structures and recursive im- 
perative functions operating on the trees. The subset is chosen such that the 
verification we intend to perform becomes decidable. Thus, for instance, integer 
arithmetic is omitted from the language; only finite enumeration types can be 
expressed. Also, to smoothen presentation, many other C constructs have been 
omitted although some of them easily could be included directly, and other by 
approximating their behavior. 

We begin by defining the core language. After that, we describe how pro- 
grams can be annotated with formulas expressing additional requirements for 
correctness. 



2.1 The C Subset 

The abstract syntax of the C subset is defined using EBNF notation, where fur- 
thermore ® is used to denote comma-separated lists with zero or more elements. 
The semantics of the language is as known from C. 

A program consists of declarations of structures, enumerations, variables, and 
functions: 

program — >■ ( struct \ enum \ var \ function )* 

A structure contains an enumeration denoting its value and a union of structures 
containing pointers to its child structures. An enumeration is a list of identifiers: 

struct — >■ struct id { 

enum id id\ 
union { 

( struct { 

( struct id * id] )* 

} id] r 

} id] 

}; 

enum — >■ enum id { id~^ } ; 

The enumeration values denote the kind of the structure, and the kind determines 
which is the active union member. The association between enumeration values 
and union members is based on their indices in the two lists. Such data structures 
are typical in real-world C programs and exactly define recursive data-types. One 
goal of our verification is to ensure that only active union members are accessed. 

For abbreviation we allow declarations of structures and enumerations to be 
inlined. Also, we allow ( struct id * id] )* in place of union {...}, implicitly 
meaning that all union members are identical. A variable is either a pointer to 
a structure or an enumeration: 
var — >• type id] 

type — >■ struct id * | enum id 
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A function can contain variable declarations and statements: 

function — >■ ( void | type ) id(. { type id )® ) { 

var* stm' 

( return rvalue', )’ 

} 

A statement is a sequence, an assignment, a function call, a conditional state- 
ment, a while-loop, or a memory deallocation: 

stm — >■ stm stm \ 

lvalue = rvalue-, \ 

idi ( rvalue )® ) ; | 

if ( cond ) stm ( else stm Y \ 

while ( cond ) stm \ 

free( lvalue ) ; 

A condition is a boolean expression evaluating to either true or false; the ex- 
pression ? represents non-deterministic choice and can be used in place of those 
C expressions that are omitted from our subset language: 

cond — >■ cond k cond \ cond I cond \ ! cond \ rvalue == rvalue \ ? 

An lvalue is an expression designating an enumeration variable or a pointer 
variable. An rvalue is an expression evaluating to an enumeration value or to 
a pointer to a structure. The constant 0 is the NULL pointer, malloc allocates 
memory on the heap, and id(. . is a function call: 

lvalue id { -> id {. id Y )* 

rvalue — >■ lvalue \ 0 | malloc (sizeof ( id)) \ id{ rvalue® ) 

The nonterminal id represents identifiers. 

The presentation of our verification technique is based on C for familiarity 
reasons only — no intrinsic C constructs are utilized. 



2.2 Modeling the Store 

During execution of a program, structures located in the heap are allocated and 
freed, and field variables and local variables are assigned values. The state of an 
execution can be described by a model of the heap and the local variables, called 
the store. 

A store is modeled as a finite graph, consisting of a set of cells representing 
structures, a distinguished NULL cell, a set of program variables, and pointers 
from cells or program variables to cells. Each cell is labeled with a value taken 
from the enumerations occurring in the program. Furthermore, each cell can 
have a free mark, meaning that it is currently not allocated. 

Program variables are those that are declared in the program either globally 
or inside functions. To enable the verification, we need to classify these variables 
as either data or pointer variables. A variable is classified as a data variable by 
prefixing its declaration in the program with the special comment /**data**/; 
otherwise, it is considered a pointer variable. 
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A store is well-formed if it satisfies the following properties: 

— the cells and pointers form disjoint tree structures (the NULL cell may be 
shared, though); 

— each data variable points either to the root of a tree or to the NULL cell; 

— each pointer variable points to any cell (including the NULL cell); 

— a cell is marked as free if and only if it is not reachable from a program 
variable; and 

— the type declarations are respected — this includes the requirement that a cell 
representing a structure has an outgoing pointer for each structure pointer 
declared in its active union member. 

With the techniques described in the remainder of this paper, it is possible to 
automatically verify whether well-formedness is preserved by all functions in a 
given program. Furthermore, additional user defined properties expressed in the 
logic presented in Section 2.3 can be verified. 

The following illustrates an example of a well-formed store containing some 
RGB-trees as described in Section 1. Tree edges are solid lines whereas the values 
of pointer variables are dashed lines; free cells are solid black: 




2.3 Store Logic 

Properties of stores can conveniently be stated using logic. The declarative and 
succinct nature of logic often allows simple specifications of complex require- 
ments. The logic presented here is essentially a first-order logic on finite tree 
structures [20] . It has the important characteristic of being decidable, which we 
will exploit for the program verification. 

A formula (j) in the store logic is built from boolean connectives, first-order 
quantifiers, and basic propositions. A term t denotes either an enumeration value 
or a pointer to a cell in the store. A path set P represents a set of paths, where 
a path is a sequence of pointer dereferences and union member selections ending 
in either a pointer or an enumeration field. The signature of the logic consists of 
dereference functions, path relations, and the relations free and root: 

f — >■ <=> (j> I 

ex id : cf) | all id : f \ true | false | 
id ( P y == t I free( t ) | root( t ) 
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A path relation, id P == t, compares either enumeration values or cell pointers. 
The identifier id may be either a bound quantified variable, a program variable, 
or an enumeration value, and t is a term. 

If both id and t denote cell pointers, a path relation is true for a given store 
if there is a path in P from the cell denoted by id to the cell denoted by t in the 
store. If P is omitted, the relation is true if id and t denote the same cell. 

If id denotes a cell pointer and t is an enumeration value, a path relation is 
true for a given store if there is a path satisfying P from the cell denoted by id 
to an enumeration field with the value t in the store. 

The relation free (t) is true in a given store if the cell denoted by t is marked 
as not allocated in the store. The relation root(f) is true if t denotes the root 
of some tree. 

A term is a sequence of applications of the dereference function and union 
member selections or the constant 0 representing the special NULL cell: 

t — >■ id {-> id {. id y )* \ 0 

A path set is a regular expression: 

P -> id { . id)- \ P + P \ P P \ P * 

The path set defined by ->idi . id 2 consists of a single dereference of id\ and 
subsequent selection of the member id 2 - The expressions P + P, P P, and P * 
respectively denote union, concatenation, and Kleene star. 



2.4 Program Annotations and Hoare Triples 

The verification technique is based on Hoare triples [9], that is, constructs of the 
form {(f>i}stm{4)2}- The meaning of this triple is that executing the statement 
stm in a store satisfying the pre-condition (j>i always results in a store satisfying 
the post-condition (/> 2 , provided that the statement terminates. Well-formedness 
is always implicitly included in both (j>i and 4>2- We can only directly decide such 
triples for loop-free code. Programs containing loops — either as while-loops or 
as function calls — must be split into loop-free fragments. 

A program can be annotated with formulas expressing requirements for cor- 
rectness using a family of designated comments. These annotations are also used 
to split the program into a set of Hoare triples that subsequently can be verified 
separately. 

/**pre; (j) **/ and /**post : (p **/ may be placed between the signature and 
the body of a function. The pre formula expresses a property that the verifier 
may assume initially holds when the function is executed. The post formula 
expresses a property intended to hold after execution of the function. The 
states before and after execution may be related using otherwise unused 
variables. 

/**inv; (j) **/ may be placed between the condition and the body of a while- 
loop. It expresses an invariant property that must hold before execution of 
the loop and after each iteration. It splits the code into three parts: the 
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statements preceding the while-loop, its body, and the statements following 
it. 

/**keep: (j) **/ may be placed immediately after a function call. It expresses 
a property that must hold both before and after the call. It splits the code 
into two parts: the statements before and after the call. The keep formulas 
can specify invariants for recursive function calls just as inv formulas can 
specify invariants for while-loops. 

/**assert: (j) **/ may be placed between statements. It splits the statement 
sequence, such that a Hoare triple is divided into two smaller triples, where 
the post-condition of the first and the pre-condition of the second both are 
(/). This allows modular analysis to be performed. The variant /**assert : (j) 
assume: (j) **/ allows the post-condition and the pre-condition to be dif- 
ferent, and thereby to weaken the verification requirements. This is needed 
whenever a sufficiently strong property either cannot be expressed or requires 
infeasible computations. 

/**check: (j) **/ stm informally corresponds to “if (!(/)) fail; else stm" , 
where fail is some statement that fails to verify. This can be used to check 
that a certain property holds without creating two Hoare triples incurring a 
potential loss of information. 

Whenever a pre- or post-condition, an invariant, or a keep-formula is omitted, the 
default formula true is implicitly inserted. Actually, many interesting properties 
can be verified with just these defaults. As an example, the program: 

/**data**/ struct RGB *x; 
struct RGB *p; 
struct RGB *q; 

p = x; 
q = 0; 

while (p!=0 k q==0) /**inv: q!=0 => q->color==red **/ { 
if (p->color==red) q = p; 
else if (p->color==green) p = p->left; 
else /**assert: p->color==blue **/ p = p->right; 

} 

yields the following set of Hoare triples and logical implications to be checked: 

{ true }p=x;q=0; {1} 

( I & ! B ) => true 
{I&B&BI} q=p; {1} 

{I&B& IB1&B2} p= p->left; { I } 

( I & B & !B1 & !B2 ) => ( p->color==blue ) 

{I&B& !B1& !B2& p->color==blue } p = p->right ; { I } 

where B is the condition of the while-loop, 1 is the invariant, B1 is the condition 
of the outer if-statement and B2 that of the inner if-statement. Note that the 
generated Hoare triples are completely independent of each other — when a triple 
is divided into two smaller triples, no information obtained from analyzing the 
first triple is used when analyzing the second. 
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3 Deciding Hoare Triples 

The generated Hoare triples and logical implications — both the formula parts 
and the program parts — can be encoded in the logic WS2S which is known to be 
decidable. This encoding method follows directly from [11] by generalizing from 
list structures to tree structures in the style of [16]. The MONA tool provides 
an implementation of a decision procedure for WS2S, so in principle making a 
decision procedure for the present language requires no new ideas. 

As we show in the following, this method will however lead to infeasible com- 
putations making it useless in practice. The solution is to exploit the full power 
of the MONA tool: usually, WS2S is decided using a correspondence with ordi- 
nary tree automata — MONA uses a representation called guided tree automata, 
which when used properly can be exponentially more efficient than ordinary tree 
automata. However, such a gain requires a non-obvious encoding. 

We will not describe how plain MONA code directly can be generated from 
the Hoare triples and logical implications. Instead we introduce a logic called 
WSRT, weak monadic second-order logic with recursive types, which separates 
the encoding into two parts: the Hoare triples and logical implications are first 
encoded in WSRT, and then WSRT is translated into basic MONA code. This 
has two benefits: WSRT provides a higher level of abstraction for the encoding 
task, and, as a by-product, we get an efficient implementation of a general tree 
logic which can be applied in many other situations where WS2S and ordinary 
tree automata have so far been used. 

3.1 Weak Monadic Second-Order Logic with Recursive Types 

A recursive type is a set of recursive equations of the form: 

T — "Cl (cip . 1 5 ■ ■ ■ 5 . Ty'i ),..., Vji 1 1 • • ■ 1 ^n,mn ■ ^jn,mn ) 

Each T denotes the name of a type, each v is called a variant, and each c is 
called a component. A tree conforms to a recursive type T if its root is labeled 
with a variant v from T and it has a successor for each component in v such that 
the successor conforms to the type of that component. Note that types defined 
by structs in the language in Section 2.1 exactly correspond to such recursive 
types. 

The logic WSRT is a weak monadic second-order logic. Formulas are inter- 
preted relative to a set of trees conforming to recursive types. Each node is 
labeled with a variant from a recursive type. A tree variable denotes a tree con- 
forming to a fixed recursive type. A first-order variable denotes a single node. A 
second-order variable denotes a finite set of nodes. 

A formula is built from the usual boolean connectives, first-order and weak 
monadic second-order quantifiers, and the special WSRT basic formulas: 

type{t, T) which is true iff the the first-order term t denotes a node which is 
labeled with some variant from the type T; and 
variant{t, x, T, v) which is true iff the tree denoted by the tree variable x at the 
position denoted by t is labeled with the T variant v. 
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Second-order terms are built from second-order variables and the set operations 
union, intersection and difference. First-order terms are built from first-order 
variables and the special WSRT functions: 

treejroot(x) which evaluates to the root of the tree denoted by x; and 
succ{t, T, V, c) which, provided that the first-order term t denotes a node of the 
T variant v, evaluates to its c component. 

This logic corresponds to the core of the FIDO language [16] and is also remi- 
niscent of the LISA language [1]. It can be reduced to WS2S and thus provides 
no more expressive power, but we will show that a significantly more efficient 
decision procedure exists if we bypass WS2S. 



3.2 Encoding Stores and Formulas in WSRT 

The idea behind the decision procedure for Hoare triples is to encode well-formed 
stores as trees. The effect of executing a loop- free program fragment is then in a 
finite number of steps to transform one tree into another. WSRT can conveniently 
be used to express regular sets of finite trees conforming to recursive types, which 
turns out to be exactly what we need to encode pre- and post-conditions and 
effects of execution. 

We begin by making some observations that simplify the encoding task. First, 
note that NULL pointers can be represented by adding a “NULL kind” with no 
successors to all structures. Second, note that memory allocation issues can be 
represented by having a “free list” for each struct, just as in [11]. We can now 
represent a well-formed store by a set of WSRT variables: 

— each data variable is represented by a WSRT tree variable with the same 
recursive type, where we use the fact that the types defined by structs 
exactly correspond to the WSRT notion of recursive types; and 

— each pointer variable in the program is represented by a WSRT first-order 
variable. 

For each program point, a set of WSRT predicates called store predicates is used 
to express the possible stores: 

— for each data variable d in the program, the predicate rootdit) is true whe- 
never the first-order term t denotes the root of d; 

— for each pointer variable p, the predicate poSp(t) is true whenever t and p 
denote the same position; 

— for each pointer field / occurring in a union u in some structure s, the 
predicate smcc/_u,s(UU 2 ) is true whenever the first-order term t\ points to 
a cell of type s having the value u, and the / component of this cell points 
to the same node as the first-order term t 2 ', 

— for each possible enumeration value e, the predicate kinde{t) is true whenever 
t denotes a cell with value e; and 

— to encode allocation status, the predicate frees{t) is true whenever t denotes 
a non-allocated cell. 
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A set of store predicates called the initial store predicates defining a mapping 
of the heap into WSRT trees can easily be expressed in the WSRT logic. For 
instance, the initial store predicates root, succ, and kind simply coincide with 
the corresponding basic WSRT constructs. 

Based on a set of store predicates, the well-formedness property and all store- 
logic formulas can be encoded as other predicates. For well-formedness, the re- 
quirements of the recursive types are expressed using the root, kind, and succ 
predicates, and the requirement that all data structures are disjoint trees is a 
simple reachability property. For store-logic formulas, the construction is in- 
ductive: boolean connectives and quantifiers are directly translated into WSRT; 
terms are expressed using the store predicates root, kind, and succ; and the basic 
formulas free(t) and root(f) can be expressed using the store predicates free 
and root. Only the regular path sets are non-trivial; they are expressed in WSRT 
using the method from [14] (where path sets are called “routing expressions”). 
Note that even though the logic in Section 2.3 is a first-order logic, we also need 
the weak monadic second-order fragment of WSRT to express well-formedness 
and path sets. 



3.3 Predicate Transformation 

When the program has been broken into loop-free fragments, the Hoare triples 
are decided using the transduction technique introduced in [15]. In this techni- 
que, the effect of executing a loop-free program fragment is simulated, step by 
step, by transforming store predicates accordingly, as described in the following. 

Since the pre-condition of a Hoare triple always implicitly includes the well- 
formedness criteria, we encode the set of pre-stores as the conjunction of well- 
formedness and the pre-condition, both encoded using the initial store predicates, 
and we initiate the transduction with the initial store predicates. For each step, 
a new set of store predicates is defined representing the possible stores after 
executing that step. This predicate transformation is performed using the same 
ideas as in [11], so we omit the details. 

When all steps in this way have been simulated, we have a set of final store 
predicates which exactly represents the changes made by the program fragment. 
We now encode the set of post-stores as the conjunction of well-formedness and 
the post-condition, both encoded using the final store predicates. It can be shown 
that the resulting predicate representing the post-stores coincides with the wea- 
kest precondition of the code and the post-condition. The Hoare triple is satisfied 
if and only if the encoding of the pre-stores implies the encoding of the post- 
stores. 

Our technique is sound: if verification succeeds, the program is guaranteed to 
contain no errors. For loop-free Hoare triples, it is also complete. That is, every 
effect on the store can be expressed in the store logic, and this logic is decidable. 
In general, no approximation takes place — all effects of execution are simulated 
precisely. Nevertheless, since not all true properties of a program containing 
loops can be expressed in the logic, the technique is in general not complete for 
whole programs. 
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4 Deciding WSRT 

As mentioned, there is a simple reduction from WSRT to WS2S, and WS2S 
can be decided using a well-known correspondence between WS2S formulas and 
ordinary tree automata. The resulting so-called naive decision procedure for 
WSRT is essentially the same as the ones used in FIDO and LISA and as the 
“conventional encoding of parse trees” in [4] . The naive decision procedure along 
with its deficiencies is described in Section 4.1. In Section 4.2 we show an effi- 
cient decision procedure based on the more sophisticated notion of guided tree 
automata. 

4.1 The Naive Decision Procedure 

WS2S, the weak monadic second-order logic of two successors, is a logic that is 
interpreted relative to a binary tree. A first-order variable denotes a single node 
in the tree, and a second-order variable denotes a finite set of nodes. For a full 
definition of WS2S, see [20] or [13]. 

The decision procedure implemented in MONA inductively constructs a tree 
automaton for each sub-formula, such that the set of trees accepted by the 
automaton is the set of interpretations that satisfy the sub-formula. This decision 
procedure not only determines validity of formulas; it also allows construction 
of counterexamples whenever a formula is not valid. 

Note that the logic WSnS, where each node has n successors instead of just 
two, easily can be encoded in WS2S by replacing each node with a small tree 
with n leaves. The idea in the encoding is to have a one-to-one mapping from 
nodes in a WSRT tree to nodes in a WSnS tree, where we choose n as the 
maximal fanout of all recursive types. 

Each WSRT tree variable x is now represented by b second-order variables 
vi, . . . ,Vb where b is the number of bits needed to encode the possible type 
variants. For each node in the n-ary tree, membership in v\ . . .Vb represents 
some binary encoding of the label of the corresponding node in the x tree. 

Using this representation, all the basic WSRT formulas and functions can now 
easily be expressed in WSnS. We omit the details. For practical applications, this 
method leads to intractable computations requiring prohibitive amounts of time 
and space. Even a basic concept such as type well-formedness yields immense 
automata. Type well-formedness is the property that the values of a given set of 
WS2S variables do represent a tree of a particular recursive type. 

This problem can be explained as follows. The WS2S encoding is essentially 
the same as the “conventional encoding of parse trees” in [4], and type well- 
formedness corresponds to grammar well-formedness. In that paper, it is shown 
that the number of states in the automaton corresponding to the grammar well- 
formedness predicate is linear in the size of the grammar, which in our case 
corresponds to the recursive types. As argued e.g. in [12], tree automata are 
at least quadratically more difficult to work with than string automata, since 
the transition tables are two-dimensional as opposed to one-dimensional. This 
inevitably causes a blowup in time and space requirements for the whole decision 
procedure. 
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By this argument, it would be pointless making an implementation based 
on the described encoding. This claim is supported by experiments with some 
very simple examples; in each case, we experienced prohibitive time and space 
requirements. 



4.2 A Decision Procedure using Guided Tree Automata 

The MONA implementation of WS2S provides an opportunity to factorize the 
state-space and hence make implementation feasible. To exploit this we must, 
however, change the encoding of WSRT trees, as described in the following. 

The notion of guided tree automata (GTA) was introduced in [2] to combat 
state-space explosions and is now fully implemented in MONA [13]. A GTA is a 
tree automaton equipped with separate state spaces that — independently of the 
labeling of the tree — are assigned to the tree nodes by a top-down automaton, 
called the guide. The secret behind a good factorization is to create the right 
guide. 

A recursive type is essentially also a top-down automaton, so the idea is to 
derive a guide from the recursive types. This is however not possible with the 
naive encoding, since the type of a WSnS node depends on the actual value of 
its parent node. 

Instead of using the one-to-one mapping from WSRT tree nodes to WSnS 
tree nodes labeled with type variants, we represent a WSRT tree entirely by the 
shape of a WSnS tree, similarly to the “shape encoding” in [4]. Each node in 
the WSRT tree is represented by a WSnS node with a successor node for each 
variant, and each of these nodes have themselves a successor for each component 
in the variant. A WSRT tree is then represented by a single second-order WSnS 
variable whose value indicates the active variants. 

The following illustrates an example of a tree conforming to the recursive 
type Tree=A(left : Tree .right : Tree) ,B(next :Tree) .NULL and its encodings: 

A 

/\ 

NULL B 

NULL 

(a) a tree 

This encoding has the desired property that a WSnS tree position always is 
assigned the same type, independently of the tree values, so a GTA guide can 
directly be derived from the types. This guide factorizes the state space such 
that all variants and components in the recursive types have their own separate 
state spaces. Furthermore, the intermediate nodes caused by the WSnS to WS2S 
transformation can now also be given separate state spaces, causing yet a degree 
of factorization. 

One consequence is that type well-formedness now can be represented by 
a GTA with a constant number of states in each state space. The size of this 
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automaton is thus reduced from quadratic to linear in the size of the type. Similar 
improvements are observed for other predicates. 

With these obstacles removed, implementation becomes feasible with typical 
data-type operations verified in seconds. In fact, for the linear sub-language, our 
new decision procedure is almost as fast as the previous WSIS implementation; 
for example, the programs reverse and zip from [11] are now verified in 2.3 
and 29 seconds instead of the previous times of 2.7 and 10 seconds (all using the 
newest version of MONA). This is remarkable, since our decision procedure suffers 
a quadratic penalty from using tree automata rather than string automata. 

5 Conclusion 

By introducing the WSRT logic and exploiting novel features of the MONA 
implementation, we have built a tool that catches pointer errors in programs 
working on recursive data structures. Together with assisting tools for extracting 
counterexamples and graphical program simulations, this forms the basis for a 
compile-time debugger that is sound and furthermore complete for loop-free 
code. The inherent non-elementary lower bound of WSnS will always limit its 
applicability, but we have shown that it handles some realistic examples. 

Among the possible extensions or variations of the technique are allowing 
parent and root pointers in all structures, following the ideas from [14], and 
switching to a finer store granularity to permit casts and pointer arithmetic. 
A future implementation will test these ideas. Also, it would be interesting to 
perform a more detailed comparison of the technique presented here with pointer 
analysis and shape analysis techniques. 
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Abstract. We describe \ink<; (pronounced “links”), a low-level calculus designed 
to serve as the basis for an intermediate representation in compilers for class-based 
object-oriented languages. The primitives in \ink<; can express a wide range of 
class-based object-oriented language features, including various forms of inhe- 
ritance, method override, and method dispatch. In particular, \ink<; can model 
the object-oriented features of Moby, OCaml, and Loom, where subclasses may 
be derived from unknown base classes. \ink<; can also serve as the intermediate 
representation for more conventional class mechanisms, such as Java’s. In this 
paper, we formally describe Xink<;, give examples of its use, and discuss how 
standard compiler transformations can be used to optimize programs in the Xinkq 
representation. 



1 Introduction 

Class-based object-oriented languages provide mechanisms for factoring code into a 
hierarchy of classes. For example, the implementation of a text window may be split 
into a base class that implements windows and a subclass that supports drawing text. 
Since these classes may be defined in separate compilation units, compilers for such 
languages need an intermediate representation (IR) that allows them to represent code 
fragments (e.g., the code for each class) and to generate linkage information to assem- 
ble the fragments. For languages with manifest class hierarchies (i.e., languages where 
subclass compilation requires the superclass representation, as is the case in C++ [Str97] 
and Java [AG98]), representing code fragments and linkage information is straightfor- 
ward. But for languages that allow classes as module parameters, such as Moby [FR99a] 
and OCaml [RV98,Ler98], or languages that have classes as first-class values, such as 
Loom [BFP97], the design of an IR becomes trickier (Section 2 illustrates the compli- 
cations). 

We are interested in a compiler IR that can handle inheritance from non-manifest 
base classes. In addition, the IR should satisfy a number of other important criteria. 
The IR should be expressive enough to support a wide range of statically typed surface 
languages from Java to Loom. The IR should be reasonably close to the machine and 
should be able to express efficient object representations (e.g., shared method suites) 
and both static and dynamic method dispatch. The IR should enable optimizations based 
on simple and standard transformations. Lastly, the IR should be amenable to formal 
reasoning about compiler transformations and class linking. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 135-149, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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This paper presents \inkq, whieh is an extension of the untyped A-calculus that meets 
these design goals. Xink<; extends the A-calculus with method suites, which are ordered 
collections of methods; slots, which index into method suites; and dictionaries, which 
map method labels to slots. In Xinkc,, method dispatch is implemented by first using 
a dictionary to find the method’s slot and then using the slot to index into a method 
suite. Xinkc, can support true private names and avoid the fragile base class problem by 
dynamically changing the dictionary associated with an object when the object’s view of 
itself changes [RS98,FR99b]. Separating dynamic dispatch into two pieces also enables 
more compiler optimizations. In this paper, we treat Xinkc, as a compiler IR, although 
the reader should think of it as more of a framework or basis for a compiler’s IR. 

By design, Xinkc, satisfies our goals. Because of the abstractions in Xink<,, it can 
express a wide range of surface class mechanisms, from the static classes found in Java 
through the dynamic inheritance of Loom (Section 5). By making Xink<, untyped, we 
avoid limiting the applicability of Xinkc, to languages with incompatible type systems. 
The operations in the calculus allow compilers to leverage static information to optimize 
message dispatch. For example, the type system in C++ guarantees the slot at which each 
method may be found at run-time. In Xink<,, we may use this information to evaluate 
the dictionary lookup operation associated with message dispatch at compile time — 
providing the expected efficiency for message dispatch to C++ programmers. Because 
Xinkc, is based on the A-calculus, familiar A-calculus optimizations apply immediately to 
Xinkc, (Section 6), and these optimizations yield standard object-oriented optimizations 
when applied to Xinkc, programs. Consequently, ad-hoc optimizations for the object- 
oriented pieces of a compiler based on Xink<, are not necessary. Because Xink<, is a formal 
language, it is amenable to formal reasoning. For example, one can show that Xink(; is 
confluent and that the reductions tagged as linking redexes are strongly normalizing 
(Section 4). 

In the next section, we discuss the challenges involved in implementing inheritance 
from an unknown base-class. In Section 3, we present the syntax, operational semantics, 
and rewrite systems of Xinkc,. To keep the discussion focused, we restrict the technical 
presentation to a version of Xinkq with methods, but no fields (instance variables). The 
techniques used to handle methods apply directly to fields (see Section 5.1). Section 4 
defines a simple class language Scl and shows how it can be translated to Xink<;. We 
prove that the translation of any “well-ordered" Scl program has the property that all 
linking steps can be reduced statically. In Section 5, we sketch how Xinkc, can serve as an 
IR for Moby, Loom, a mixin extension for Scl, and C++. Section 6 further demonstrates 
the utility of the rewriting system for Xinkc, by showing how method dispatch can be 
optimized in the calculus. We conclude with a discussion of related and future work. 

2 Inheritance from Unknown Classes 

One of our principal design goals is to support inheritance from unknown base classes. 
Figure 1 shows where difficulties can arise when compiling languages with such a feature. 
The example is written in Moby, although similar examples can be written in Loom and 
OCaml. The module in Figure 1 defines a class ColorPt that extends an unknown base 
class Pt . Point by inheriting its getX and getY methods, overriding its move method, 
and adding a color field. When compiling the module, the compiler knows only that 
the Pt. Point superclass has three methods (getX, getY, and move). The compiler 
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signature PT { 
class Point : { 

public meth getX : Unit -> Int 
public meth getY : Unit -> Int 
public meth move : (Int, Int) -> Unit 

} 

> 

module ColorPtFn (Pt : PT) { 
class ColorPt { 
inherits Pt. Point 
field c : Color 

public meth move (x : Int, y : Int) -> Unit { 
if (self.c == Red) 

then super .move (2*x, 2*y) 
else super .move (x, y) 

} 

} 



Fig. 1. Inheriting from an unknown superclass 



does not know in what order these methods appear in the internal representation of the 
Point class, nor what other private methods and fields the Point class might have. 
As an example of inheritance from such a class, suppose we have a class PolarPt 
that implements the Pt .Point interface and has additional polar-coordinate methods 
getTheta and getRadius. When we apply the ColorPtFn module to PolarPt, we 
effectively hide the polar-coordinate methods, making them private and allowing their 
names to be reused for other, independent methods in ColorPt and its descendants. Such 
private methods, while hidden, are not forgotten, since they may be indirectly accessible 
from other visible methods (e.g., the PolarPt class might implement getX in terms of 
polar coordinates). This hiding is a problem when compiling the PolarPt class, since 
its code must have access to methods that might not be directly available in its eventual 
subclasses. 

3 \inkc; 

\ink(, is a A-calculus with method suites, slots, and dictionaries, which provides a nota- 
tion for class assembly, inheritance, dynamic dispatch, and other object-oriented features. 



3.1 Syntax 

The syntax of Xink<, is given by the grammar in Figure 2. In addition to the standard A- 
calculus forms, there are eight expression forms for supporting objects and classes. The 
term (ci , . . . , e„) constructs a method suite from the expressions Ci , . . . , e„, where each 
Ci is assigned slot i. The expression e@e' extracts the value stored in the slot denoted by 
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X 

Xx.e I e(e') 
(ei,...,e„) I TTie 

(ei , . . . , 6n) 

e@e' 

e||e' 

e@e' ^ e" 
i 

e + e' 

{mi !->• ei, . . . , m„ 1 -^ 
elm 



variable 

function abstraction/application 
tuple creation/projection 
method suite construction 
method suite indexing 
method suite extension 
method override 
slot 

slot addition 
dictionary construction 
dictionary application 



Fig. 2. The syntax of Xink<; 



(Xx.e)(v) ^ e[x w] 

7Ti(ui, . . . , Vn) ^ Vi where 1 < i < n 
i + j ^ k where k = i + j 
{mi 1 -^ Ui, . . . , m„ 1 -^ ii„}!mi ^ Vi where 1 < i < n 
{Vl,... ,V„) II {v'l,... ^ (Ul,... ,V-a,v'i,... 

{vi,... ,Vi,... , Vn)@i <— v' (vi, . . . ,v', . . . ,Vn) Where 1 < i < n 
(vi, . . . , v„)@i ^ Vi where 1 < i < n 



Fig. 3. Reduction rules for Xink<; 

e! from the method suite denoted by e. The method suite extension e| |e' concatenates the 
suites e and e'. The last method suite operation is override, which functionally updates 
a slot in a given suite to produce a new suite. A slot is specified by a slot expression, 
which is either an integer i or the addition of two slot expressions. The expression 
|toi ^ e\,. . . , m„ e„} denotes a dictionary where each label rm is mapped to the 
slot denoted by e,. Application of a dictionary to a label m is written elm. 

We identify terms up to the renaming of bound variables and use e\x ^ e'] to denote 
the capture-free substitution of e' for x in e. We assume that dictionaries are unordered 
and must represent finite functions. For instance, the dictionary |m i— 1, m i— 2} is an 
ill-formed expression, since it maps m to two different values. To simplify notation, we 
use the following shorthands: 

let a; = e in e' for {Xx.e'){e) 

\{xi,... ,Xn)-e for Xp.{{Xxi.---Xxn.e){'Kip)---{TTnP)) 

3.2 Operational Semantics 

We specify the operational semantics of Xink<; using an evaluation-context based rewrite 
system [FF86]. Such systems rewrite terms step-by-step until no more steps can be taken. 
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At each step, the term to be reduced is parsed into an evaluation context and a redex. 
The redex is then replaced, and evaluation begins anew with another parsing of the term. 
Note that since Xinkq is untyped there are legal expressions, such as tt^ (Ax.e), that 
caimot be reduced. 

Two grammars form the backbone of the semantics. The first describes values, a 
subset of expressions that are in reduced form: 

v::=x\\x.e \ (vi,...,u„) | (ui,...,u„) | i \ {mi ui, . . . , m„ v„} 

The second grammar describes the set of evaluation contexts. 

E ::= [•] I E{e) \ v{E) \ in E 

I (ui,. ,e„) I £;||e | v\\E 

I E@e t— e I v@E t— e | t— E \ E@e \ v@E 

I E + e\ v + E\ {mi I— 1- ui, . . . , mi I— >■ £1, . . . , m„ I— 1- e„} | Elm 

The primitive reduction rules for Xink<; are given in Figure 3. We write e i-A e' if 
e = E[eo], e' = if [eg], and eg ^ eg by one of the rules above. 

3.3 Reduction System 

Under the operational semantics, there is no notion of transforming a program before 
it is run: all reductions happen when they are needed. We want, however, a method for 
rewriting A/nks terms to equivalent, optimized versions. The basis of the rewrite system 
is the relation We write — > for the congruence closure of this relation; i.e., for the 
system in which rewrites may happen anywhere inside a term. For example, reductions 
like (Ax.tti (vi,x)){e) — >■ (Ax.ui)(e) are possible, whereas in the operational semantics 
they are not. We write — for the reflexive, transitive closure of -a. 

The reduction system will be used in the next two sections when we discuss static 
linking for a simple class language and optimizations. The reduction relation — is non- 
deterministic: multiple paths may emanate from a single expression, but it is confluent. 



Theorem 1 Ife — >■* e' and e — >■* e", there is an e'" such that e' — >■* e'" and e" — >■* e'" . 
The proof uses the Tait-Martin-Ldf parallel moves method [Bar84]; we omit the proof 



4 A Simple Class Language 

To give evidence of the expressivity of Xink<,, we now give a translation of a simple class- 
based language into Xinkq. Simpler translations may be possible, but the translation here 
illustrates some techniques that are useful for more complex languages. 

The source language is called Scl for “simple class language.” The syntax of Scl 
appears in Figure 4. A program consists of a sequence of one or more class declarati- 
ons followed by an expression; class declarations may only use those declarations that 
appear before and may not be recursive. There are two forms of class declaration. The 
first is a base-class declaration, which defines a class as a collection of methods. The 
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prog : 


:= del prog 
1 exp 


Programs 


del : 


:= class C { meths } 


Class declarations 




1 class C f inherit C' 


: { m* ]■ meths J 


meths : 


~ e 

1 meth meths 




meth : 


:= m{x)exp 


Methods 


exp : 


:= X 

1 self 

1 exp <= m{exp) 

1 super m{exp) 

1 new C 


Expressions 



Fig. 4. The syntax for Scl 



second form is a subclass declaration, which defines a class by inheriting methods from 
a superclass, overriding some of them, and then adding new methods. The subclass 
constrains the set of methods it visibly inherits from its superclass by listing the names 
of such methods as { m* }. Other method names can be called only by superclass, 
not subclass, methods. This operation — in essence, a restriction operation — resembles 
Moby’s support for private members [FR99b,FR99a] and subsumes mechanisms found 
in Java and other languages. 

At the expression level, Scl contains only those features relevant to linking. Methods 
take exactly one argument and have expressions for bodies; expressions include self, 
method dispatch, super-method dispatch, and object creation. A more complete language 
would include other expression forms, e.g., integers, booleans, and conditionals. 

The translation from Scl into \inkc, fixes representations for classes, objects, and 
methods. Each fully-linked class is translated to a triple (ct, (j), /r), where a is the size 
of the class (i.e., the number of slots in its method suite), (j) is a dictionary for mapping 
method names to method-suite indices, and p, is the class’s method suite. Each object is 
translated to a pair of the object’s method suite and a dictionary for resolving method 
names. Each method is translated into a pre-method [AC96]; i.e., a function that takes 
self as its first parameter. 

The translation is defined by the following functions: 

V\prog\r Program translation 

CJdriJr Class translation 

M Method translation 

£\exp\n„p„,4>super,4>,eif,r Expression translation 

These functions take a class environment F as a parameter, which maps the name of a 
class to its Xinkg representation. A class environment is tuple of fully-linking classes. The 
symbol /^(C) denotes the position in the tuple associated with class C, and F±{C e} 
denotes the tuple with e bound to class C. 
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The translation of methods and expressions require more parameters than the trans- 
lation of programs and classes. In addition to a class environment, the method and ex- 
pression translation functions take additional parameters to translate self and super. In 
particular, the dictionary </>je//is used to resolve message sends to self, and the method 
suite fisuper ™d dictionary (psuper are used to translate super invocations. Each method 
is translated to a Xinkc; pre-method as follows: 

= X{self, 

Expressions are translated as follows: 

^ Psuper y^ksuper ^^self : ^ ^ 

Sfexpi <= = let obj = f 

let meth = {tti obj)@{{TT 2 obj)lm) 
in meth{obj,Slexp 2 lp^^„,c^^^„,c^,,,f,r) 
where obj and meth are fresh 

fisuper 4= = {^^super@{(t>supe^m)){self, ^ 

= let {a,(j,p) = r{C) in (/x, </>) 

To translate self, we extract the method suite of self and pair it with the current self 
dictionary, fseif- Note that because of method hiding, fseif may have more methods than 
{tt 2 self) [RS98,FR99b]. To translate message sends, we first translate the receiver object 
and bind its value to obj. We then extract from this receiver its method suite (tti obj) 
and its dictionary (TT 2 obj). Using dictionary application, we find the slot associated 
with method m. Using that slot, we index into the method suite to extract the desired 
pre-method, which we then apply to obj and the translated argument. We resolve super 
invocations by selecting the appropriate code from the superclass method suite according 
to the slot indicated in the superclass dictionary. Notice that this translation implements 
the standard semantics of super-method dispatch; i.e., future overrides do not affect the 
resolution of super-method dispatch. We translate the super keyword to the ordinary 
variable self. In the translation of new, we look up the class to instantiate in the class 
environment. In our simple language, the new object is a pair of the class’s method suite 
and dictionary. 

The translation for subclasses appears in Figure 5. In the translation, certain subterms 
are annotated by a superscript L; these subterms denote link-time operations that are 
reduced during class linking. In addition, we use the function Names(metfo) to extract 
the names of the methods in meths. 

A subclass C is translated to a function/ that maps any fully-linked representation 
of its base class B to a fully-linked representation of C. The body of the linking function 
/ has three phases: slot calculation, dictionary definition, and method suite construction. 
In the first phase, fresh slot numbers are assigned to new methods (cr„), while overridden 
(aov) and inherited methods {ainh) are assigned the slots they have in B. The size of the 
subclass method suite (crc) is calculated to be the size of B’s suite plus the number of 
new methods. In the dictionary definition phase, each visible method name is associated 
with its slot number. During method suite construction, the definitions of overridden 
methods are replaced in the method suite for B. The function then extends the resulting 
method suite with the newly defined methods to produce the method suite for C. 
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C[class C {. inherit B : { m* }■ meths }]r = 



let^ 




= CTB -F^ 1 . . . 


iGt 


+^k 


let^ 


^ovi - 


= (Pb\^OVi .. 


• 1st — 


(pB^^OVj 


let^ 


^inhi 


= (pB^^inhi . 


• • 1st O'inhj^ ' 


= (pB^^inhi 


let^ 


(JC = 


' O B k 






let^ 


(pc = 


: { !->• , 


. . . , 1 ¥ CTfif, ) 








OVl !->• (Tovi 


, . . . , OVj 1-^ a, 


Wj , 






inhi !->■ <Xinh 


, ... , inhi 


(Xinhi} 


let^ 


Mo = 


Pb 






let^ 


Ml = 


M0@ (Xovi ^ — 


Af |raef/iovilMB 


,4>b^4>c^^ 



let — ^ 1 MB j'i's I'i’c 

let^ fic = W 11'^ {Mlmethniji,s,4>B,<l>c,r, - ■ ■ ,M[methnJ fJ-B ^4>B 

in {ac, (pc, pc) 

where 

NewNames = {mi, . . . , rik} = ¥iames{meths) \ { m* } 

OvNames = {ovi, . . . , ovj} = { m* } n Names (meffe) 

InhNames = {inhi, . . . , inhi} = { m* } \ OvNames 

{meth„^, . . . , meih„^} = {m{x)exp \ m{x)exp G meths anim G NewNames} 

{methovi , • • • , methovj } = {m{x)exp \ m{x)exp G meths and m G OvNames} 



Fig. 5. Translating Scl classes to \ink<; 



For base-class declarations, the translation is similar, except that there are no inherited 
or overridden methods. Furthermore, we use a special class (0, {},()) for the base- 
class argument. We omit the details for space reasons. Finally, we translate programs as 
follows: 

Vldcl prog\r = 'P\prog\r' where F' = F±{C >->■ C|dc/]r(F^(S))} 
Vlexplr = £{exp\()^{}^{}^r 

The B stands for the base class in the definition of del. 

The language Scl enjoys the property that for a well-ordered program — one in 
which all classes have been defined, and every class is defined before it is used — all 
linking operations labeled L can be eliminated statically. More formally. 

Theorem 2 If prog is a well-ordered program and V\prog\r = e, then there is a term 
e' such that e — >■* e' and e' contains no linking operations labeled L. 

This theorem can probably be proven using a size argument, but we use a strong-nor- 
malization approach instead. The proof of strong normalization is a bit subtle because 
expressions in Xinks, can loop. We use a simple type system to show that a fragment 
of Xinkc; is strongly normalizing. The proof of strong normalization relies upon Tail’s 
method [GLT89]. One may show that the translation of a well-ordered program is well- 
typed in the system, and hence all linking reductions can be done statically. We omit the 
proof for space reasons. 
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5 Other Examples 

We now sketch how Xinkq can be used to compile class mechanisms found in various 
programming languages. 

5.1 Moby Classes 

We originally designed Xinkq to support Moby’s class mechanism in a compiler that 
we are writing. Section 4’s Scl models many of the significant parts of Moby’s class 
mechanism, including one of its most dilficult features to compile, namely its treatment of 
private names. In particular, Moby relies on signature matching in its module mechanism 
to hide private methods and fields [FR99a] (we illustrated this feature with the example 
in Section 2). Because Moby signatures define opaque interfaces, the Moby compiler 
cannot rely on complete representation information for the superclass of any class it is 
compiling. Instead, it must use the class interface of the superclass {e.g., the Pt class in 
the PT signature) when compiling the subclass. Scl models this situation by requiring 
each subclass to specify in the inherits clause which superclass methods are visible. 

The main piece missing from Scl are fields {a.k.a. instance variables), which require 
a richer version of Xinkc,. While fields require extending the representation of objects 
with per-object instance variables, the details of instance variable access are very similar 
to those of method dispatch. As with methods, fields require dictionaries to map labels 
to slots and slot assignment. Dictionary creation and application are the same as for 
methods. When we create an object using new, we use the size of the class’s instance 
variables as the size of the object to create — object initialization is done imperatively. 

5.2 OCaml Classes 

Like Moby, OCaml is a language with both parameterized modules and classes [Ler98]. 
For the most part, translating OCaml classes to Xink<; is similar to translating Moby clas- 
ses. The one difference is that OCaml supports a simple form of multiple inheritance, 
whereas Moby only has single inheritance. A class in OCaml can inherit from several 
base classes, but there is no sharing between base classes — the methods of the base 
classes are just concatenated. The one subtlety that we must address is that when compi- 
ling a class definition, we cannot assume that access to its methods will be zero-based in 
its subclasses. To solve this problem, we A-abstract over the initial slot index. Otherwise, 
translating OCaml classes to Xink^ is essentially the same as for Moby classes.' 

5.3 Loom Classes 

In the language Loom [BFP97], the class construct is an expression form, and a deriving 
class may use an arbitrary expression to specify its base class. Thus, unlike the translation 
in Section 4, a translation of Loom to our calculus cannot have the phase distinction 
between class link-time and run-time. In a translated Loom program, computation of 
slots, dictionary construction, method overrides, and method suite extensions can all 
happen at run-time. The fact that we can use one representation to handle both static and 
dynamic classes demonstrates the flexibility of our approach. 

* To the best of our knowledge, the implementation techniques used for classes in the OCaml 
system have not been formalized or described in print, so we are not able to compare approaches. 
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5.4 Mixins 

Mixins are functions that map classes to classes [FKF98] and, unlike parameterized 
modules, mixins properly extend the class that they are applied to (recall that applying 
ColorPtFn to PolarPt hid the polar-coordinate interface). Supporting this kind of class 
extension in Xinkc; requires a bit of programming. The trick is to include a dictionary 
constructor function as an argument to the translated mixin. For example, consider the 
following mixin, written in an extension of Scl syntax: 
mixin Print (C <: {show}) { 

meth print () { stdOut print (self showO) } 

} 

This mixin adds a print method to any class C that has a show method already. The 
translation of this mixin to Xinkc; is similar to that of subclasses given in Section 4: 
Xiac , 4>c > He .mkDict) . 

let Cfprint ~ 

let f Print ~ mkDict (0^ , Oprint^ 
let pre_print = A(self) . 

let print = (tti stdOut)@( (7T2 stdOut) ! print) 
let show = (tti self )@(<))pj.i„t ! show) 
in print (stdOut , show (self)) 
let HPrint = HC \ \ (pre_print) 

in (.tTprintx f Print x HPnint') 

The main difference is that we use the mkDict function, supplied at the linking site, 
to create the extended dictionary. An alternative to this approach is to add a dictionary 
extension operation to Xinkt;. For purposes of this example, we assume that the surface 
language does not permit method-name conflicts between the argument class and the 
mixin, but it is possible to support other policies, such as C++-style qualified method 
names, to resolve conflicts. 



5.5 C++ and Java Classes 

For a language with a manifest class hierarchy, such as C++ or Java, the language’s static 
type system provides substantial information about the representation of dictionaries 
and method suites. By exploiting this representation information, we can optimize away 
all of the dictionary-related overhead in such programs, which results in the efficiency 
of method dispatch that C++ and Java programmers expect. The disadvantage of this 
approach is that it introduces representation dependencies that lead to the so-called 
fragile base class problem, in which changing the private representation of a base class 
forces recompilation of its subclasses. We should note that we do not know how to handle 
C++’s form of multiple inheritance in Xink<; because of the object layout issues related to 
sharing of virtual base classes [Str94]. 



6 Optimization 

Many compilers for higher-order languages use some form of A-calculus as their inter- 
mediate representation (IR). In this section, we show that the techniques commonly used 
in A-calculus-based compilers can be used to optimize our encoding of method dispatch 
in Xinkc;. Because Xink^ allows reuse of standard optimizations, the optimizer is simpler 
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and more likely to be correet. It is important to note that the optimizations deseribed in 
this section also apply to objects with instance variables. Even though instance varia- 
bles are mutable, the optimizations focus on the dictionary and method-suite operations, 
which are pure. Consequently, the compiler is free to move these operations, subject 
only to the constraints of their data dependencies. 

To make the discussion concrete, we consider the \ink<, representation of Scl pro- 
grams and their optimization. In general, method dispatch in Scl requires an expensive 
lookup operation to map a method’s label to its method-suite slot. Often, however, it is 
possible to apply transformations to reduce or eliminate this cost. We assume that we 
are optimizing well-typed programs that do not have run-time type errors (see Fisher 
and Reppy [FR99b] for an appropriate type system). We also assume that we produce 
the IR from Scl as described in Section 4, with the further step of normalizing the terms 
into a direct-style representation [FSDF93,Tar96,OT98] (a continuation-passing style 
representation [App92] is also possible). In this IR, all intermediate results are bound to 
variables, and the right-hand side of all bindings involve a single function application or 
primitive operation applied to atomic arguments (i.e., either variables or constants). 



6.1 Applying CSE and Hoisting 

Common subexpression elimination (CSE) is a standard optimization whereby two iden- 
tical pure expressions are replaced by a single expression. When method invocations are 
expanded into the Xink<; representation, there are many opportunities for CSE optimiza- 
tions. For example, if there are two method invocations to the same object, fetching its 
dictionary will be a common subexpression. If the method calls are to the same method, 
then the dictionary application and method suite indexing operations will be common 
subexpressions. 

Another standard transformation is to hoist invariant expressions out of functions. 
When applied to method dispatch, this transformation amortizes the cost of a dictionary 
application over multiple function applications or loop iterations.^ 

6.2 Self-Method Dispatch 

While CSE and hoisting apply to any method dispatch, we can do significantly better 
when we have a message sent to self. Recall that the translation of the self-method 
dispatch self -4= m{exp) into Xinla; is 
let obj = (rriCsel/), 4>seif) 
let meth = Tri(obj) @ (7T2 (obj) !m) 
in methiobj , exp) 

Normalizing to our IR and applying the standard contraction phase [App92] gives the 
following: 

let /i = TVi(self) 
let obj = (.fi, (pseif) 
let a = 

let meth = p@a 
in methiobj , a) 

Note that loops are represented as tail-recursive functions in this style of IR. 
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where a is the atom resulting from normalizing the argument expression. The expression 
(jisei/ ! ni is invariant in its containing premethod, and thus the binding of cr can be lifted out 
of the premethod. This transformation has the effect of moving the dictionary application 
from run-time to link-time and leaves the following residual: 
let /i = TTiCsel/) 
let obj = (.fi, 4>seif) 
let meth = 
in methiobj , a) 

While it is likely that a compiler will generate this reduced form directly from a source- 
program self-method dispatch, this optimization is useful in the case where other op- 
timizations (e.g., inlining) expose self-method dispatches that are not present in the 
source. 

6.3 Super-Method Dispatch 

Calls to superclass methods can be resolved statically, so there should be no run-time 
penalty for superclass method dispatch. While it is possible to “special-case" such method 
calls in a compiler, we can get the same effect by code hoisting. Recall that the translation 
of the super-method dispatch super <= m{exp) into Xink<; is 

(.^super ® (()>super!ni) ) (.self, exp) 

As before, we normalize to our IR and contract, which produces the following: 

Xet: (T (f^super ! ^ 

X0~t fTiQih f-^supsT^^ 
in methiself, a) 

where a is the atom resulting from normalizing the argument expression. In this case, 
we can hoist both the dictionary application and the method-suite indexing out of the 
containing method, which leaves the term “methiself, a ) .” Thus, by using standard A- 
calculus transformations, we can resolve super-method dispatch statically. Furthermore, 
if the superclass’s method suite is known at compile time, then the standard optimization 
of reducing a selection from a known record can be applied to turn the call into a direct 
function call. This reduction has the further effect of enabling the call to be inlined. 

6.4 Using Static Analysis 

The optimizations that we have described so far require only trivial analysis. More 
sophisticated analyses can yield better optimizations [DGC95]. For example, receiver- 
class prediction [GDGC95] may permit us to eliminate some dictionary applications in 
method dispatches (as we do already for self-method dispatch). There may also be source- 
language type information, such as f inal annotations, that can enable optimizations, 
such as static method resolution. 



6.5 Final Code Generation 

We intentionally left the implementation of dictionaries abstract in Xink<; so that the 
optimization techniques described above can be used independently of their concrete 
representation. Depending on the properties of the source language, dictionaries might 
be tables [Rem92,DH95], a graph structure [CC98], or a simple list of method names. 
We might also use caching techniques to improve dispatch performance when there is 
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locality [DS84] . We might also maintain information in the compiler as to the origin of the 
dictionary and use multiple representations, each tailored to a particular dictionary origin. 
For example, a Java compiler can distinguish between dictionaries that correspond to 
classes and dictionaries that correspond to interfaces. In the former case, the dictionary 
is known at class-load time and dictionary applications can be resolved when the class 
is loaded and linked. For interfaces, however, a dictionary might be implemented as an 
indirection table [LST99]. 



7 Related Work 

There is other published research on IRs for compiling class-based languages. The 
Vortex project at the University of Washington, for instance, supports a number of class- 
based languages using a common optimizing back-end [DDG+95]. The Vortex IR has 
fairly high-level operations to support classes: class construction and method dispatch 
are both monolithic primitives. Xink, on the other hand, breaks these operations into 
smaller primitives. By working at a finer level of granularity, A/nkc is able to support a 
wider range of class mechanisms in a single framework (e.g., Vortex cannot support the 
dynamic classes found in Loom). 

Another approach pursued by researchers is to encode object-oriented features in 
typed A-calculi. While such an approach can support any reasonable surface language 
design, its effectiveness as an implementation technique depends on the character of 
the encoding. For example. League, et. al, have recently proposed a translation of a 
Java subset into the FLINT intermediate representation extended with row polymor- 
phism [LST99]. Although they do not have an implementation yet, their encoding seems 
efficient, but it is heavily dependent on the semantics of Java. For example, their trans- 
lation relies on knowing the exact set of interfaces that a class implements. The encoding 
approach has also been recently tried by Vanderwaart for Loom [Van99]. In this case, 
because of the richness of Loom’s feature set, the encoding results in an inefficient 
implementation of operations like method dispatch. We believe that a compiler based 
on Xinkq can do at least as well for Java as the encoding approach, while doing much 
better for languages like Moby and Loom that do not have efficient encodings in the 
A-calculus. 

In other related work, Bono, et. al have designed a class calculus, based on the A- 
calculus, for evaluating single and mixin inheritance [BPS99]. The focus of their work 
differs from ours, in that their calculus describes the core functionality of a particular 
surface language, whereas we provide the basic building blocks with which to imple- 
ment a myriad of surface designs. Essentially, their language could be implemented in 
Ainkc; the translation from their calculus to Xinkc, would capture the implementation 
information encoded in their operational semantics. 

There are other formal linking frameworks [Car97,Ram96,GM99AZ99,DEW99]. 
Of particular relevance here are uses of /3-reduction to implement linking of modules, as 
we do for the linking of classes. From the very beginning, the Standard ML of New Jersey 
compiler has used the A-calculus to express module linking [AM87]. More recently, Flatt 
and Felleisen describe a calculus for separate compilation that maps units to functions 
over their free variables [FF98]. 
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8 Conclusions 

We have presented Xink, a low-level calculus for representing class-based object- 
oriented languages. Xinkc^ satisfies the goals we set in designing an IR. In particular, 
it provides support for inheritance from non-manifest base classes, such as occurs in 
Moby, OCaml, and Loom. It is amenable to formal reasoning, such as in the proof 
of termination of linking in Section 4. As illustrated in Section 5, Xink<; is expressive 
enough to support a wide-range of surface languages, from the concrete representati- 
ons of Java to the dynamic classes of Loom. Finally, simple A-calculus optimizations, 
such as common subexpression elimination and hoisting, yield standard object-oriented 
optimizations, such as method caching, when applied to A/nfes terms. 

We are currently implementing a compiler for Moby that uses Xink<; as the basis 
of the object fragment of its IR. One refinement that we use in our implementation is 
to syntactically distinguish between the link-time and run-time forms of Xinkq. In the 
future, we plan to explore the use of Xink<, to support dynamic class loading and mobile 
code, and to develop a typed IR based on Xink<,. 
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Abstract. Abstract interpretation theory has successfully been used for 
constructing algorithms to statically determine run-time properties of 
programs. Central is the notion of an abstract domain, describing certain 
properties of interest about the program. In logic programming, program 
analyses typically fall into two different categories: either they detect 
program points where the property definitely holds (universal analyses) 
or possibly holds (existential analyses). We study the relation between 
such analyses in the case where the concrete domain is a lattice join- 
generated by its set of join-irreducible elements. Although our intended 
application is for logic programming, the theory is sufficiently general for 
possible applications to other languages. 



1 Introduction 

Abstract interpretation theory has successfully been used for constructing al- 
gorithms to statically determine run-time properties of programs. Traditionally, 
the semantics of the program is specified with a concrete domain. The central no- 
tion is to approximate program semantics by defining an abstract domain whose 
operations mimic those of the concrete domain. The abstract domain describes 
certain properties of interest about the program. Each element of the abstract 
domain specifies information about a possibly infinite number of concrete sta- 
tes. Thus, in order to construct an abstract domain tracing a property of the 
program, the property needs to be considered as a property over sets of concrete 
states. 

Our aim is to provide new techniques for the construction of new abstract 
domains from given ones. Many operations have been designed for systematically 
constructing new domains. Domain operators studied include reduced product 
[8,4], reduced power [8] and disjunctive completion [8,11]. Linear refinement is 
introduced in [13] as an extension of the Heyting completion studied in [14]. In 
[15], a new domain for freeness analysis of logic programs is defined using linear 
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refinement. In this paper, we suppose that the concrete domain is a lattice join- 
generated by its set of join-irreducible elements. In this case, given any property 
p defined over each individual concrete state, p can always be uniformly extended 
to a property over sets of concrete states. 

For example, in logic programming it is standard to define the concrete do- 
main as the powerset of substitutions, p{Sub), partially ordered by set inclusion. 
p{Sub) is join-generated by Sub. For many properties of logic programs, it is na- 
tural to first define the property on substitutions and then lift the property to 
include sets of substitutions. Consider the property of groundness. A variable x 
is ground under a substitution 9 G Sub if 9 binds a; to a term with no variables. 
Letting X be the set of variables of interest, the mapping gr : Sub — >■ p(A) is 
defined: 



gr{9) = {x € X \ var{9{x)) = 0}. 

Suppose we now want to consider groundness as a property with domain p{Sub). 
We can consider either definite (universal) groundness or possible (existential) 
groundness. For definite groundness, Gr^ : p{Sub) — >■ p{X) is defined: 

Gr'^iO) = \e&9}. 

For possible groundness, Gr^ : p{Sub) — >■ p{X) is defined: 

Gr^{0) = \J{gr{9) \9&0}. 

Note that definite groundness traces positive information about the groundness 
of program variables, whereas possible groundness traces negative information. 
Knowledge of both positive and negative information about program properties 
such as groundness is particularly useful for debugging applications. 

In general, given a concrete domain C , an abstract domain D and a property 
p mapping the join-irreducible elements of C to D, p is extended to C using the 
join operation of D. We name this extension of p the D-lattice property of p. For 
example, Gr^ is the D^^-lattice property of gr where is the lattice p{Sub), 
partially ordered by D with set intersection as the join operation. Gr^ is the 
IZg^-lattice property of gr where is the lattice p{Sub), partially ordered by 
C with set union as the join operation. 

The main theoretical results shown are as follows: 

— Given a Galois connection (C, a, D, 7 ) (where C is completely distributive 

and join-generated by its set of join-irreducible elements) specifying an ana- 
lysis tracing positive information of p, we show how to construct a mirror 
Galois connection (C, a™, 7 ™) (where is the dual lattice of D) spe- 

cifying an analysis tracing negative information of p. 

— Suppose op : C — >■ G is a concrete operation and {D, op') is a correct ab- 

stract interpretation of (C,op) specified by {C,a, D,j). We find conditions 
on {D, op') and {C, op) which ensure that {D‘^, op') is a correct abstract in- 
terpretation of {C,op) specified by (G, a™, 7 ™). 
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The paper is organised as follows: in Section 3 we define the notion of lattice 
properties and mirror properties. Section 4 considers some applications with 
the well-known domains Pos and Sharing of logic programming. In Section 5 
we consider the safe approximation of concrete functions in analyses for mirror 
properties. Finally, Section 6 gives some concluding remarks and directions for 
future work. 

2 Preliminaries 

Throughout the paper, we assume familiarity with the basic notions of lattice 
theory ([3]) and abstract interpretation ([7,8,9]). Below we introduce notation 
and recall some of the central notions. 

2.1 Lattice Theory 

In the following, we assume (A, fl^, U^, T^, T^) is a complete lattice. The 
dual lattice (A, T^) is defined such that: 



1 . 


Vo, 


b G A.a £[4 & iff 6 a; 


2 . 


rid 

' 'a 


= Ua; 


3. 


1 i'^ 
'-‘a 


= fIa; 


4. 


-rd 
' A 


= Aa; 


5. 


1 ^ 
-‘-A 


= T^. 



We will often write A'^ to denote the dual lattice (A, Fl]^, U]^, T]^, T]^). Given 

a mapping / : Ai — >■ A 2 , we will sometimes abuse notation by also writing / to 
denote the dual mapping Z'* : Af — >• A 2 such that /(a) = /'^(a) for all a G Ai. 

An element a G A is join-irreducible if, for any S C A, a = U^S' implies 
a G S. The set of join-irreducible elements of A is denoted by JI{A). Letting 
S' C A, then A is join- generated by S if, for all a G A, a = Ua{x G S | a: \Za o}- 
For convenience, we assume J-a = LIa®- An element a G A is an atom if a covers 
J-A, he. a yf J-A and Vx G A.{Jla Ca x Qa a) (a; = a). We denote by 
atom A the set of atoms of A. Note that atom a C JI{A). A is atomistic if A is 
join- generated by atom a- A is dual- atomistic if A'^ is atomistic. 

A complete lattice A is completely distributive if, for any {xi^k \ i G I,k G 
K{i)} C A, the following identity holds: 

|~| |_J Xi^k= |_J |~|2:i,/(i), 

iGl k^K{i) 

where for any i G I, K{i) is a set of indices, and I K is the set of all functions 
/ from / to \Ji^j K{i) such that Vi G I-f{i) G K{i). 

Example 1. The powerset of any set S, p(S), ordered with set-theoretic inclu- 
sion, is completely distributive and join-generated by S. In this case p(S') is also 
an atomistic lattice where the atoms are the elements of S. 
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The key property of completely distributive lattices we shall use is: 

Lemma 1 ([2]). Let A be a completely distributive lattice. Then, x G JI{A) 
iff for any S C A, x Qa Ua ^ implies x Qa s for some s € S. 

2.2 Galois Connections 

If C and D are posets and a : C ^ D, ^ : D ^ C functions such that 
Vc G C.\/d G D.a{c) Qd d c Qc 7 (d), then (C,a, D,j) is a Galois connec- 
tion between C and D. If in addition 7 is I-l, or, equivalently, a is onto then 
(C, a, D, 7) is a Galois insertion of D in C. In the setting of abstract interpreta- 
tion, C and D are called the concrete and abstract domains, respectively. Given 
a Galois connection (C,a, D,j), a and 7 are uniquely determined by each other. 
A practical consequence of this is that an abstract interpretation can be perfor- 
med by defining only one of a or 7. We assume that every concrete domain C 
and abstract domain D form complete lattices. Given a concrete domain C and 
an abstract domain D, a property is defined as a (partial) mapping from C to 
D. Every Galois connection (C,a, D,j) can be viewed as a specification of the 
property a : C ^ D. 

An important property of Galois connections is the preservation of bounds. 
Suppose C, D are complete lattices. A mapping a \ C ^ D \s additive if it 
preserves least upper bounds. Thus if S' C C then adJc ~ Ud{c':(c) I c G S}. 
A mapping a : C ^ D is eo-additive if a : C"^ — >■ is additive. If (C, a, D, 7) 
is a Galois connection, then a is additive. The converse is also true, i.e. if a 
is additive then a entirely determines a unique Galois connection (C,a, D,j). 
Thus in order to define a Galois connection (C, a, D, 7) (where C, D are complete 
lattices), it is sufficient to define an additive a. 

One way of defining new Galois connections is by composition. Given two 
Galois connections (C, aA, A,ja) and (A,ao, D, 7 d), (C,aA o (Xd, D,jjy o ja) 
is a Galois connection. We call (C,aA o ao,D, 7 D ° 7 a) the composition of 
(C, OA, A,7 a) and (A,ao, D,jd)- 

Suppose (C, a, D, 7) is a Galois connection and opc : C ^ C, opo : D ^ D 
are operations on C and D, respectively. (D,opd) is a correct abstract inter- 
pretation of (C,opc) specified by (C,a, D,j) if a(opc(7(d))) Qd opoid) for 
all d € D. (D,opd) is optimal if opn = a o opc o 7 . If (D,opd) is optimal, 
then opD is the best approximation of opc relative to D. (D,opd) is complete if 
a o opc = opooa. Gompleteness is a stronger property than optimality. Indeed, 
whenever {D, opo) is complete, it can be shown that opu = a o opc o 7 [ 10 , 12 ]. 
The completeness of opc depends on D and is a property of the abstract domain. 

If (C, a, D,j) is a Galois insertion, each value of the abstract domain D 
is useful in the presentation of the concrete domain as all the elements of D 
represent distinct members of C. Moreover, any Galois connection may be lifted 
to a Galois insertion. This is done by identifying those values of the abstract 
domain with the same concrete meaning into an equivalence class. This process 
is known as reduction of the abstract domain. Each Galois insertion (C, a, D, 7) 
can equivalently be considered as an upper closure operator on G, p = 70a. For 
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every Galois connection (C,a,D,j), let {C,a=, D=,j=) be the Galois insertion 
obtained by reducing {C,a,D,'y). We associate the (upper) closure operator 
p = o a= with {C,a, D,j). The set of closure operators on C is partially 
ordered such that pi G p 2 if Vc G C. p\ (c) Qc P 2 (c) . In this approach, the order 
relation on the set of closure operators on C corresponds to the order by means 
of which abstract domains are compared with regard to precision. More formally, 
if (G, ai,I?i, 7 i) and (G, 02 , 1^2, 72 ) are Galois connections with the associated 
closure operators p\ and p 2 , respectively, then we say D\ is more precise than 
£>2 if Pi E P2- 



3 Properties of Programs 

In abstract interpretation, Galois connections are used to specify properties of 
programs. To define a Galois connection (G, a, H, 7 ) between a concrete domain 
G and an abstract domain D, all we need to do is define an additive function 
a : G — >■ U. It is well known that in the case where the concrete lattice G is 
join-generated by J/(G), additive functions mapping G to an abstract domain 
D are completely determined by their values for join irreducible elements. More 
specifically, if a : G — £ is additive then 

a{c) = I |{o;(a;) | x € JI{C) A x Qc c}- 

D 

Example 2. For logic programs, a standard choice of concrete lattice is the ato- 
mistic lattice Gl = {p{Sub),C, n, U, 0, Sub), where Sub denotes the set of idem- 
potent substitutions. 

A program variable is ground if it is bound to a unique value. Groundness 
can be thought of as a property over Sub, i.e. as a property over JI{Cl)- Let 
X be the set of variables of interest. Then the set of variables ground under 
9 G Sub is given by gr : JI{Cl) — >■ p{X) defined 

gr{9) = {x € X \ var{6{x)) = 0}. 

Let 0 C Sub. The set of variables that are definitely ground under all 6 G 0 is 
given by Gr^ : Cl ^ pi-^) where 

Gr'^{0) = {xGX\\/9g 0.var{9{x)) = 0} = p|{gr( 6 l) \ 9g0}. 

Alternatively, the set of variables that are possibly ground under all 6 G 0 is 
given by Gr^ : Cl ^ pi-^) where 

Gr^{0) = {xGX\39g 0.var{9{x)) = 0} = \J{gr{9) \ 9 g0}. □ 



Definition 1. Let G be a lattice. Then p is an JI property for G if there exists 
a set D such that p maps JI{C) to D (denoted p : JI{C) -G D). □ 
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Definition 2. Suppose C is join-generated by JI{C) and let p : JI{C) — >■ D 
be a JI property for C. Suppose D forms a complete lattice under the partial 
ordering Then the D-lattice property of p, P : C — >■ D, is defined such that 
for every c € C, 



P{c) = UoiPiP) I ^ € JI{C) ^x\Zc c}. 

Let be the dual lattice of D. If P is the P-lattice property of p then we define 
the mirror property of P to be the P'^-lattice property of p. □ 

Note that the mirror of the mirror of P is P. 

Example 3. Let Dgr be the complete lattice (p(Al), C, n, U, 0, X). In Example 2, 
Gr^ is the Pg^-lattice property of gr, and Gr^ is the P^^-lattice property of gr. 
Hence Gr'^ and Gr^ are mirror properties. □ 

In the case where C is also a completely distributive lattice, we have the following 
theorem. 

Theorem 1. Suppose C is a completely distributive lattice join generated by 
J/(C) and P is a complete lattice. Let (C, a, D, 7 ) be a Galois connection. Then 
there exists o’”, 7 "* such that 

1 . a™ is the mirror property of a. 

2. (C, o'", 7 ™) is a Galois connection. 

Proof. To prove 1, observe that as C is join-generated by JI{G), for each c£ G, 

a{c) = I ^ S ^ Ec c}. 

Hence by Definition 2, 

a^{c) = I |^{a(a;) | x G JI{G) A x Gc c}. 

To prove 2, it is sufficient to show that a™ is additive. But 

S) = I |^{a(x) I X G JI{G) A x Qc Uc Definition 2) 

= I I {a(a;) | x G JI{G) A x Gc s A s G S} (by Lemma 1) 

= I ^ e 5}. 

Hence a™ is additive. □ 

The compositional design of Galois connections is a method for specifying pro- 
gram properties by successive refinements. The following lemma gives a suffic- 
ient condition for the preservation of compositions of Galois connections between 
mirror properties. 
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Lemma 2. Suppose C is a completely distributive lattice join-generated by 
JI{C), and A, D are complete lattices. Suppose {C,ap, D,'fp), (C, a™, 7™), 

(C,aA,A,jA) and (C,a’^, are Galois connections such that ap, a™ 

and UA-, <Aa mirror properties. Also suppose {A, an, D,^n) is a Galois 
connection such that {C,ap, D,jp) is the composition of {C\aA, A,ja) and 
{A,aE>,D,'yu). Then if is co-additive, there exists 'Jd ■ A'^ such that 

{A'^,aD, forms a Galois connection and (C, a™, 7™) is the compo- 
sition of (G, a™, 7™) and {A‘^ ,ao, D'^ , 1 d)- 

Proof. First note that an : A — >■ I? is co-additive implies that ao ■ A'^ — >■ D’^ is 
additive, and so there exists 7 d : — >• A’^ such that (A*^, od, 7d) forms a 

Galois connection. 

To show that (G, a™, 7™) is the composition of (G, a™, A*^, 7™) and 

{A'^,aD, it is sufficient to show that a™ = anoceA- Suppose c G G. By 

Definition 2 , 

'aJT(c) = |~|{<ap(a;) | x G JI{C) A x Qc c}. 

D 

Now ap{x) = au{aA{x)) and so 

I X e JI{C) A X Ec c}- 

D 

But «£) is co-additive and so 

a^{c) = o;D(|~|{aA(a;) | a: G J/(G) A a; Ec c}) = (c))-0 

A 

Let Pp, PA be the associated closure operators of (G, Up, D, 7^) and (G, aA, A, 7^), 
respectively. Note that whenever (G, ap, D, jf) is the composition of (G, aA, A, 7 ^) 
and (A, ao, D, Id), then pA E Pp- Thus Lemma 2 can be interpreted as giving a 
sufficient condition for the preservation of the relative precision between mirror 
properties, that is, when pA E Pp implies p™ E pff (where pff,PA are the asso- 
ciated closure operators of (G, a™ , A'^, 7™) and (G, a™, 7™), respectively). 

4 Applications 

We consider the abstract domains Pos and Sharing from logic programming. 
In the following, let Vars denote a countable set of variables, and X denote a 
non-empty finite subset of Vars containing the variables of interest. 

4.1 Pos 

We briefly recall the definition of Pos. The domain Pos consists of the set of 
positive propositional formulae on X, where a propositional formula is positive 
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if it is satisfied when every variable is assigned the value true. Pos is a lattice 
whose ordering is given by logical consequence, and the join and meet by logi- 
cal disjunction and conjunction, respectively. Adding the bottom propositional 
formula false to Pos, makes Pos a complete lattice. Letting Cf he the concrete 
domain defined in Example 2, the Galois insertion {CL,OiposT Pos,^pos) is such 
that apos '■ Cl ^ Pos where for all 0 G Cl, 

= V A {x GG l\^var{9{x))}. 

9e0 xex 

Note that apos is the Pos-lattice property of the JI property Ppos '■ Sub -gPos 
defined such that 

Ppos{0)= !\{x l\var{e{x))}. 

x^X 

The abstract unification function for Pos, Unif^°^ : Pos x Pos -G Pos, is given 
by logical conjunction, that is, the meet operation of Pos. 

Recall that in Examples 2 and 3, definite groundness is specified by Gr^. In 
fact Gr'^ maps Cl onto Dg^ and so there exists such that {Cl, Gr^ , Dg^,^'^) 
forms a Galois insertion. This domain is originally due to Jones and Spndergaard 
[16]. In [18], when considering the concrete domain to be sets of substitutions 
closed by instantiation, it is shown that Pos can be constructed by using only 
the definition of groundness. More specifically, [18] shows that Pos is exactly the 
least abstract domain which contains all the (double) intuitionistic implications 
between elements of Dg^.. 

Let ao '■ Pos -0 Dg,. be defined such that for all </> G Pos, 
aD{4>) = {x G X \ (j)\= x}. 

Now is additive since ^ 4>2) = aoifii) H aD{4>2)- Hence there exists 

7 £) such that (Pos, 7 d) forms a Galois connection. Also Gr^ {0) = 

aL>(apos(0)) for all 0 G Cl, therefore {CL,Cr^ , Dg^,^'^) is the composition of 
{CL,apos,Pos,-fpos) and (Pos, au, 7 _d). 

The mirror property of Gr^ is Gr^. Now Gr^ maps Cl onto Dg^ and so there 
exists 7 ^ such that {CL,Gr^ , Dg^,j^) forms a Galois insertion. 

The mirror property of apos is : Cl ^ Pos^ where 

a'^osi'S) = f\ f\{xGG l\var{9{x))}. 

0eO xex 

Lemma 3. There exists 7 ™^ such that {CL,affos, Pos’^ ,'^'ffos) forms a Galois 
connection. Also (Cl, Gr^, ZJgr, 7 ^) is the composition of {Cl, a^os, Pos‘^ ,J^os) 
and {Pos'^,aD, Dgr,jD)- 

Proof By Theorem 1 there exists 7 ™^ such that {CL,affos,Pos'^ fo™s a 
Galois connection. Now aD{(f tf) = ao{(j)) U aL>{'f), and so ajj ■ Pos -G Dg^ 
is co-additive. Therefore by Lemma 2, {CL,Cr^ , Dgr,j^) is the composition of 
{CL,affos,Pos'^,-fffos) and {Pos‘^,ao, Dgr,jo)- □ 
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Lemma 4. If Card{X) > 2, is not onto, thus {C l, a^osi P ^l™os) is not 
a Galois insertion. 



Proof. By inspecting the definition of it can be seen that a™^(0) X 
when Card{X) > 2, for any 0 £ Cl- Hence a™g is not onto. □ 

In order to obtain a Galois insertion, we apply the reduction process to Pos‘^. 
{Cl, a'^^^,Pos'^, 7 ™ J reduces to {CL,a^„,/=, Pas’ll =, 7 ™^/=) where for £ 
Pos‘^, 

= 7^s(</>) = Oi^os/={c) = {<t>\4> = a™s(c)}- 

Let P C Pos'^ be defined such that 

r ={x££ f\{Vi, ■■■ , 2/n} I VI < f < n.x yf j/J. 

By inspecting the definition of a™ ,, (and noting that Sub is the set of idempotent 
substitutions, i.e. 9 £ Sub implies x ^ var{9{x)) for all x), it can be seen that 
Pos‘^/ = is the lattice A C Pos'^ where A is the closure of P under conjunction. 
From Lemma 3 we obtain: 

Theorem 2. Pos^ j = is more precise than Dgr- 

Thus the precision ordering has been preserved for the mirror properties. 



4.2 Sharing 

We define Sharing as in [1]. We define the set sharing domain SPI = p{SG) 
where SG = {S C p{X) | 0 ^ S}. SPI is partially ordered by set inclusion such 
that the join is given by set union and the meet by set intersection. 

Let Cl be the concrete domain defined in Example 2. The set of variables 
occurring in a substitution 9 through the variable v is given by the mapping 
occs : Sub X AT — >• p(X) defined such that 

occs(9,x) = {y £ X \ X £ var{9{y))'\. 

Given this, the Galois insertion (Cl, ash, SH, 'jsh) specifying SH can be defined 
such that 



^sh{0{ — u {occs( 6 *,a;) | x £ Vars, occs(9, x) yf 0}. 
eee 

Note that ash is the SH-lattice property of the J/ property Psh '■ Sub -£ SH 
defined such that 

Psh{9) = {occs{9,x) I X £ Vars, occs(9, x) yf 0}. 

For Sharing, the abstract unification function is defined as a mapping which 
captures the effects of a binding a; — >■ t on an element of SH . The definition uses 
the following three operations defined over SH . 
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The function bin : SH x SH — >■ SH, called binary union is given by 

S'2) = {si U S2 I Si G -S'!, S2 G S'2}. 

The star-union function (•)* : SH — SH is given by 

S* = {s&SG\ 3S' C 5.5 = U S'}. 

The relevant eomponent function rel : p{X) x SH — >■ SH is given by 

rel{V,S) = {s G S' I snUyf 0}. 

Let Voc = {x}, Vt = var{t) and Vxt = U Vt- Then 

Unif''^{S,x ~^t) = {S\ (rel{vxt,S)) U bin{rel{vx, S)* ,rel{vt, S)*). 

A domain for pair sharing is PS = p{Pairs{X)) where Pairs(X) = {{x,y} \ 
x,y £ X, X ^ y}. PS is specified by the Galois insertion {CL,ctps,PS,'-fps), 
where 



aps{0) = [J{{a;,y} G Pairs(X) \ var{9{x)) r\var{9{y)) yf 0}. 
eee 

Note that aps is the PS-lattice property of the JI property Pps : Sub — >■ PS 
defined such that 



Pps{(^) = {{x^y} G Pairs{X) \ var{9{x)) r\var{9{y)) yf 0}. 

Defining agp : SH — >• PS such that 

asp{S) = [J{Pairs(s) | s G S}, 

it follows that aps{0) = asp{ash{0)) for all O G Cl- Also asp(Si U S 2 ) = 
lJ{Patrs(s) I s G Si U S 2 } = asp(Si) U Ofsp(S 2 ). Therefore asp is additive 
and so there exists 7 sp such that (SH,asp,PS,jsp) forms a Galois connection. 
It follows that {Cr^aps, PS,^ps) is the composition of {CL,ash, SH,jsh) and 
{SH,asp, PS,^sp), and so PS is more abstract than SH. 

The mirror property of agh is — >■ SH‘^ defined such that 

a^(G>) — {occs(9,x) | x G Vars,occs(9,x) yf 0}. 

9e0 



Lemma 5. There exists 7 ^ such that (Cl, SiL'*, 7 ^) forms a Galois ins- 
ertion. 



Proof. By Theorem 1, there exists 7 ^ such that {CL,a^, SH‘^,^^) forms a 
Galois connection. To prove is onto, we show Va G SH‘^.39 G Sub.a}f}{{9}) = 
a by induction on Card{a). 
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The base case is when a = 0 . Let 6 = {x ^ t \ x G X} where t is a ground 
term. Then a^({6*}) = 0 - 

Suppose 3 s G a and let a' = a \ {s}. Using the induction hypothesis, 36 ' G 
Sub.a'Pf^{{6'}) = a'. Let u G Vars \ X be a variable such that u ^ var{6'{x)) 
for any x G X. For every y G s, suppose 9 '{y) = t'y. Let ty be a term such that 
var{ty) = var{t'y) U {m}. Then defining 6 such that 6 {x) = for all x G s and 
6 {x) = 6 '(x) otherwise, a^({0}) = a. □ 

The mirror property of aps is a™ : Cl — >■ PS'^ defined such that 

= Pi {{a;, y} € Pairs{X) \ var{ 9 {x)) n var{ 9 {y)) yf 0 }. 
eee 



Lemma 6. There exists 7™ such that (CL,a™,PS‘^,j™) forms a Galois inser- 
tion. 

Proof. By Theorem 1 , there exists 7™ such that (Cl, a™ , 7™) forms a 

Galois connection. We show that aVf is onto. 

First suppose a = = Pairs{X). Let u G Vars \ X. Then if 9 {x) = u for 

every x G X, alf'g{{ 9 }) = Pairs{X) as required. 

Suppose a yf Tpg. PS is dual-atomistic with atompgd = {Pazrs(X)\{{x, y}} \ 
{x,y} G PS}. Therefore for every a yf Tpg, a = I ^ G atompgd A a C x}. 
But a™( 0 ) = n{Pps(^) I ^ ^'ad so it is sufficient to show that Vo G 

atompgd .39 G Sub.pps{ 9 ) = a. 

Suppose a = Pairs{X) \ {{x,y}} and let u,v G Vars \ X. Defining 9 such 
that 9 {x) = u, 9 {y) = v and 9 {z) = f{u, v) for every z G X \ {x, y}, Pps( 9 ) = a. 

□ 



Theorem 3. If Card(X) > 3 then SH'" is not more precise than PS"'. 

Proof. We need to show there exists OgCl such that 7^(<a^(0)) ^7^(<aJ(^(0)). 

Suppose X = {x,y,z} (it is easy to generalise the proof for Card{X) > 
3 ). Let 0 = {61,92} where 9 \ = [x ^ y,z ^ y} and 62 = [x ^ y}. It 
follows that 7^(a^({6>i,6»2})) = 7^({{x, y, x}}n{{x, j/}}) = 7^(0) = Sub. But 
7^(a™s({^'i,6'2})) = lffs{{{x,y}}) C Sub. Therefore 7,”^ (a^( 0 )) ^ 7^(a™s(6>)). 

□ 



Thus in general the precision ordering is not preserved for mirror properties. 
Theorem 4. PS"" is not more precise than SH"" . 

Proof. We need to show there exists OgCl such that 7^(a™ ( 0 )) ^7^(a^(0)). 

Let 0 = {e} where e is the identity substitution. Now 7^(o^({e})) = 
7^({{a;} I X G X}) C Sub and 7 ^(a™s({e})) = 7 ^( 0 ) = Sub. Therefore 
IpsKsiO')) ^ 7Th«hm). □ 



Hence the precision of S'iL™ and PS"" is not comparable in general. 




Abstract Domains for Universal and Existential Properties 



161 



5 Operations on Concrete Domains 

When the concrete lattice C is join-generated by JI{C), many operations on C 
can be defined in terms of operations on JI(C). 

Definition 3 . Suppose C is join-generated by JI{C). Then op is a JI operation 
if op : JI{C) X JI{C) — >■ JI{CY- For each concrete operation Op : C x C ^ C, 
we say Op is uniformly defined from a JI operation op if for all ci, C2 G C, 



Op{ci,C2) = \Ac{op{xi,X2) I xi,X2 € JI{C) A Xi Qc Cl A X2 Qc C2}. 



Example 4- In logic programming, unification and projection can both be de- 
fined as JI operations unif : Sub x Sub — >■ Sub, projy '■ Sub — >■ Sub (for 
V C Vars) as follows: 

unif {01,62) = mgu{eqn{9i), eqn{02)), 



projv{0) — O' where for each x G Vars, 0'{x) 



0(x) if a: G U 
X otherwise 



where eqn{9) = {x = t \ x ^ t & 0}. 

The concrete operations Unif : Cl x Cl ^ Cl and Projv ■ Cl ^ Cl can 
be uniformly defined from unif and prof as follows: 

Unif {61,02) = {J{unif{Oi,02) | 6*1 G 6>i A 6»2 G O2}, 



Projv{6) = [J{projv{9) | 6» G 6>}. □ 

Given an abstract operation Opn, we show that if {D, Opo) is a complete (and 
therefore also correct) abstract interpretation of (C,Op), then (D,Opf)) is a 
correct abstract interpretation of {C, Op) . 

Lemma 7 . Suppose C, D are complete lattices and C is join-generated by JI{C). 
Let Op : C X C — >■ C be a concrete operation uniformly defined from the JI 
operation op : JI{C) x JI{C) — >■ JI{C). Let {D, Opo) be a complete abstract in- 
terpretation of Op specified by {C, a, D, 7). Then {D’^, Opo) is a correct abstract 
interpretation of {C,Op) specified by (C, a™, 7™). 

Proof We need to show that Op{j"^ {di) , {(I2)) Ec j"'{OpD {61,62)) for all 

61,62 G D. 

Note that from Definition 3 it follows that Op is monotonic, i.e. if ci Ec 
and C2 Ec c'2 then Op{ci, C2) Ec Op{c'i, C2). Since {D, Opn) is complete, Opn = 
a o Op o 7. Hence since Op, a, 7 are all monotonic, OpL> is also monotonic. Now 

^ Note that to simplify the notation we assume that a JI operation has at most two 
input arguments. The results presented can easily be extended to operations with 
any number of arguments. 
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Op(7’"(di),7™(c?2)) = 

\_\fj{op{xi,X2) I Xi,X2 G JI{C) A Xi Qc A X2 Ec 7™(t^2)}- 

Therefore it is sufficient to show that op{xi,X2) Ec l'^{OpD{d\,d2)) for all 
X\,X2 G JI{C) such that x\ Ec 7 ™(c^i) and X2 Ec 7'"(c?2)- Now x\ Ec 7 ™(c?i) 
implies a"^(xi) E^i di and X2 Ec 7"*(c^2) implies a’”(a;2) E^i c?2- Hence since 
OpD is monotonic, 

Opc(a™(a;i),a'"(x 2 )) E^ OpD{di,d2). 

But xi,X2 G JI{C), thus Op£)(a'"(a;i), a™(x2)) = OpD(o;(a:i), 0(0:2)) • Since 
OpD is complete, 

OpD{a{xi),a{x2)) = a{Op{xi,X2)) = a{op{xi,X2)). 

By Definition 3 , op{xi,X2) G JI{C) and so a{op{xi,X2)) = a'^{op{xi,X2))- Thus 
a'^{op{xi,X2)) Ei) OpD{di,d2) and so op{xi,X2) Ec 7 ’"(Opc(di, ^2))- □ 

Example 5 . The abstract projection function for Pos, Projy°‘^ : Pos — >■ Pos, 
amounts to existentially quantifying a formula (see [ 6 ] for details). It is shown 
that {Pos, Projy°^) is complete in Lemma 36 [ 6 ]^. Therefore by Lemma 7 , 
{Pos‘^, Projy°‘^) is a correct abstract interpretation of {Cl, Projv)- 

The abstract projection function for Sharing, Proj^ : SP[ — >■ SH, is defined 
such that 

ProjfiS) = {snE I s G S'} 

Theorem 5.2 [ 5 ] shows that {SH, Proj^) is complete. Therefore by Lemma 7 , 
{SH‘^, Proj^) is a correct abstract interpretation of {C l, Projv)- 

On the other hand, [ 6 ] shows that {Pos, Unif^°^) is not complete and [ 5 ] 
shows that {SH, Unif^^) is not complete. □ 

In fact, it can be shown that both {Pos'^, Unif^°‘^) and {SH‘^, Unif^^) are 
not correct abstract interpretations of {Cl, Unif). 

Lemma 8. {Pos'^, Unif^°^) is not a correct abstract interpretation of 
{Cl, Unif). 

Proof. It is sufficient to find (p G Pos'^ such that 

Umr^f, <P) E a™ .( C/m/( 7 ™. (<(-), (</>)))• 

Let (p be the formula x ^ y and 9 \ = {x ^ f{l,y)} and 62 = {x ^ f{y, 1 )}. 
Note that 0 i ,02 G "iffosi. 4 >)- Now unif {61,62) = {x — >■ f{l,l),y — >■ 1 } and so it 
follows that 



(x'^osiUnif{j)fl^{p),j'^^^{p))) \=xAy. 

But UnifP°^p,p) = p and so Unif°‘^{p,p) E a'ff^s{Unif{-i'ff,{p),-fff,,,{p))), 
as required. □ 

^ Note that in [6] and [5], Pos and Sharing are formulated differently from onr pre- 
sentation. In [6] and [5], however, it is evident that the proofs can be adapted. 
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Lemma 9. Unif^^) is not a correct abstract interpretation of 

{Cl, Unif). 

Proof. It is sufficient to find S € SH'^ and a binding x — >■ t such that 

Unif’"{S,x^ t) ^ a2,{Unif{j2i.S),{{x -)> t}})). 

Let S = {{x, y}}, t = /(I, y) and 0 = {x — >■ f{y, 1)}. Note that 9 G 7^(5'). Now 
unif {9, {x — >■ t}) = {x — >■ /(1, 1), y — >■ 1} and so it follows that 

a^u{Umf{-iTH{S),{{x^t}}) = %. 

But Unif^^{S, X — >■ t) = {{x, y}, {x}, {y}} and so the result follows. □ 

Hence new abstract unification operations need to be devised for both Pos‘^ 
and SH'^. 

6 Conclusion 

We have shown how, given an abstract domain D specifying a lattice property Op, 
an abstract domain specifying the mirror property a™ can be constructed. 
We have also shown that if {D, Opo) is a complete abstract interpretation of 
(C,Opc), then {D'^.Opd) is a correct abstract interpretation of (C,Opc)- 

There are instances when non-complete abstract operations computing a pro- 
perty can be used to improve the precision of operations computing the mirror 
property. For example, formulae of the form x — >■ y in Pos are interpreted as 
meaning “x ground implies y ground” . The contrapositive of this is “y non- 
ground implies x non-ground” . Thus this information could be used to improve 
the precision of a Pos'^ analysis. In fact, since non- groundness information is 
approximated by freeness information, it would seem reasonable to implement 
Pos‘^ as a reduced product construction with Pos and a domain expressing freen- 
ess information. It would be interesting to see if generalisations of this method 
could be meaningfully applied to other domains. Another direction for future 
work is to see how our approach relates to lower/upper approximations used in 
concurrency [17]. 
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Abstract. We show how linear typing can be used to obtain functional 
programs which modify heap-allocated data structures in place. 

We present this both as a “design pattern” for writing C-code in a func- 
tional style and as a compilation process from linearly typed first-order 
functional programs into mallocO-free C code. 

The main technical result is the correctness of this compilation. 

The crucial innovation over previous linear typing schemes consists of 
the introduction of a resource type O which controls the number of con- 
structor symbols such as cons in recursive definitions and ensures linear 
space while restricting expressive power surprisingly little. 

While the space efficiency brought about by the new typing scheme and 
the compilation into C can also be realised by with state-of-the-art opti- 
mising compilers for functional languages such as Ocaml [15], the pre- 
sent method provides guaranteed bounds on heap space which will be of 
use for applications such as languages for embedded systems or ‘proof 
carrying code’ [18]. 



1 Introduction 

In-place modification of heap-allocated data structures such as lists, trees, queues 
in an imperative language such as C is notoriously cumbersome, error prone, and 
difficult to teach. 

Suppose that a type of lists has been defined^ in C by 

typedef enum {NIL, CONSI kind_t ; 

typedef struct Inode { 
kind_t kind; 
int hd; 

struct Inode * tl; 

} list_t; 

and that a function 

^ Usually, one encodes the empty list as a NULL-pointer, whereas here it is encoded as 
a list_t with kind component equal to NIL. This is more in line with the encoding 
of trees we present below. If desired, we could go for the slightly more economical 
encoding, the only price being a loss of genericity. 
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list_t reverse (list_t 1) 

should be written which reverses its argument “in place” and returns it. Everyone 
who has taught C will agree that even when recursion is used this is not an entirely 
trivial task. Similarly, consider a function 

list_t insert (int a, list_t 1) 

which inserts a in the correct position in 1 (assuming that the latter is sorted) 
allocating one struct node. 

Next, suppose, you want to write a function 

list_t sort(list_t 1) 

which sorts its argument in place according to the insertion sort algorithm. Note 
that you cannot use the previously defined function insert () here as it allocates 
new space. 

As a final example, assume that we have defined a type of trees 

typedef struct tnode { 
kind_t kind; 
int label ; 

struct tnode * left; 
struct tnode * right ; 

} tree_t; 

(with kind_t extended with LEAF , NODE) and that we want to define a function 
list_t breadth (tree_t t) 

which constructs the list of labels of tree t in breadth-first order by consuming 
the space occupied by the tree and allocating at most one extra struct Inode. 
While again, there is no doubt that this can be done, my experience is that all 
of the above functions are cumbersome to write, difficult to verify, and likely to 
contain bugs. 

Now compare this with the ease with which such functions are written in a 
functional language such as Ocaml [15]. For instance, 

let reverse 1 = let rec rev_aux 1 acc = 
match 1 with 
[] -> acc 

I a::l -> rev_aux 1 (a::acc) 
in rev_aux 1 [] 

type tree = Leaf of int 

I Node of int*tree*tree 

let rec breadth t = let rec breadth_aux 1 = 
match 1 with 
[] -> [] 

I Leaf(a)::t -> a: :breadth_aux(t) 

I Node(a,l,r) : :t -> a: :breadth_aux(t @ [1] @ [r] ) 
in breadth_aux [t] 
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These definitions are written in a couple of minutes and are readily verified using 
induction and equational reasoning. 

The difference, of course, is that the functional programs do not modify their 
argument in place but rather construct the result anew by allocating fresh heap 
space. 

If the argument is not needed anymore it will eventually be reclaimed by 
garbage collection, but we have no guarantee whether and when this will happen. 
Accordingly, the space usage of a functional program will in general be bigger 
and less predictable than that of the corresponding C program. 

The aim of this paper is to show that by imposing mild extra annotations 
one can have the best of both worlds: easy to write code which is amenable to 
equational reasoning, yet modifies its arguments in place and does not allocate 
heap space unless explicitly told to do so. 

We will describe a linearly^ typed functional programming language with 
lists, trees, and other heap-allocated data structure which admits a compilation 
into malloc ()-free C. This may seem paradoxical at first sight because one should 
think that at least a few heap allocations would be necessary to generate initial 
data. However, our type system is such that while it does allow for the definition 
of functions such as the above examples, it does not allow one to define constant 
terms of heap-allocated type other than trivial ones like nil. 

If we want to apply these functions to concrete data we either move out- 
side the type system or we introduce an extension which allows for controlled 
introduction of heap space. However, in order to develop and verify functions as 
opposed to concrete computations doing so will largely be unnecessary. 

This is made possible in a natural way through the presence of a special 
resource type O which in fact is the main innovation of the present system over 
earlier linear type systems, see Section 6. 

While experiments with “hand-compiled” examples show that the generated 
C-code can compete with the highly optimised Dcamlopt native code compiler 
and outperforms the Ocaml run time system by far we believe that the effi- 
cient space usage can also be realised by state-of-the-art garbage collection and 
caching. 

The main difference is that we can prove that the code generated by our 
compilation comes with an explicit bound on the heap space used (none at all in 
the pure system, a controllable amount in an extension with an explicit allocation 
operator). This will make our system useful in situations where space economy 
and guaranteed resource bounds are of the essence. Examples are programming 
languages for embedded systems (see [12] for a survey) or “proof-carrying code”. 

In a nutshell the approach works as follows. The type O (dia_t in the C 
examples) gets translated into a pointer type, say void * whose values point to 
heap space of appropriate size to store one list or tree node. It is the task of the 
type system to maintain the invariant that overwriting such heap space does not 
affect the result. 

^ We always use “linear” in the sense of “affine linear”, i.e. arguments may be used at 
most once. 
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When invoking a recursive constructor function such as const) or nodeO 
one must supply an appropriate number of arguments of type O to provide the 
required heap space. Conversely, if in a recursion an argument of list or tree type 
is decomposed these O- values become available again. 

Linear typing then ensures that overwriting the heap space pointed to by 
these O- values is safe. 

It is important to realise that the C programs obtained as the target of the 
translation do not involve mallocO and therefore must necessarily update their 
heap allocated arguments in place. Traditional functional programs may achieve 
the same global space usage by clever garbage collection, but there will be no 
guarantee that under all circumstances this efficiency will be realised. 

We also point out that while the language we present is experimental the 
examples we can treat are far from trivial: insertion sort, quick sort, breadth 
first traversal using queues, Huffman’s algorithm, and many more. We therefore 
are lead to believe that with essentially engineering effort our system could be 
turned into a usable programming language for the abovementioned applications. 



2 Functional Programming with C 



Before presenting the language we show how the translated code will look like 
by way of some direct examples. 

For the above-defined list type we would make the following definitions: 
typedef void * dia_t; and list_t cons(dia_t d, int hd, list_t tl){ 
and list_t res; 

list_t nil(){ res. kind = CONS; 

list_t res; res.hd = hd; 

res .kind=NIL ; *(list_t *)d = tl; 

return res; res.tl = (list_t *)d; 

} return res; 



followed by 



} 



typedef struct { and 
kind_t kind; 
dia_t d; 
int hd; 
list_t tl; 

} list_destr_t ; 



list_destr_t list_destr (list_t 1) { 
list_destr_t res; 
res. kind = l.kind; 
if (res. kind == CONS) { 
res.hd = l.hd; 
res.d = (void *) l.tl; 
res.tl = *l.tl; 



} 



return res; 

} 

The function nilO simply returns an empty list on the stack. The function 
const) takes a pointer to free heap space (d), an entry (hd) and a list (tl) 
and returns on the stack a list with hd-field equal to hd and tl-field pointing 
to a heap location containing tl. This latter heap location is of course the one 
explicitly provided through the argument d. 
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The destructor function list_destr() finally, takes a list (l) and returns a 
structure containing a field kind with value CONS iff l.kind equals CONS and 
in this case containing in the remaining fields head and tail of 1, as well as a 
pointer to a free heap location capable of storing a list node (d). 

Once we have made these definitions we can implement reverse () in a fun- 
ctional style as follows: 

list_t rev_aux(list_t 10, list_t acc) { 
list_destr_t 1 = list_destr (10) ; 
return l.kind==NIL ? acc 
: rev_aux(l.tl, cons(l.d, l.hd, acc)); 

} 



list_t reverse (list_t 1) { 
return rev_aux(l ,nil 0 ) ; 

} 

Notice that reverse () updates its argument in place, as no call to mallocO is 
being made. 

To implement insert () we need an extra argument of type dia_t since this 
function, just like consO, increases the length. So we write: 

list_t insert (dia_t d, int a, list_t 10) { 
list_destr_t 1 = list_destr (10) ; 
return l.kind==NIL ? cons (d, a,nil () ) 

: a <= l.hd ? cons (d, a, cons (1 . d, 1 .hd,l . tl) ) 

: cons (d, 1 .hd, insert (l.d,a,l.tl)) ; 

} 

Using insert 0 we can implement insertion sort with in place modification as 
follows: 



list_t sort(list_t 10) { 

list_destr_t 1 = list_destr (10) ; 
return l.kind==NIL ? nil() 

: insert (1 . d, 1 .hd, sort (1 .tl) ) ; 

} 



Notice, how the value 1 . d which becomes available in decomposing 1 is used to 
feed the insert () function. 

Finally, let us look at binary int-labelled trees. We define 



tree_t leaf (int label) { and 
tree_t res; 
res. kind = LEAF; 
res. label = label; 
return res; 

} 



tree_t node(dia_t dl, dia_t 62, 

int label, tree_t 1, tree_t r) { 
tree_t res; 
res. kind = NODE; 
res. label = label; 

*(tree_t *)dl = left; 

*(tree_t *)d2 = right; 
res. left = (tree_t *)dl; 
res. right = (tree_t *)d2; 
return res; 



} 
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followed by 

typedef struct { and tree_destr_t tree_destr (tree_t t) { 

kind_t kind; tree_destr_t res; 

int label; res. label = t. label; 

dia_t dl , d2; if ((res. kind = t.kind) == NODE) { 

tree_t left, right; res.dl = (dia_t)t . lef t ; 

} tree_destr_t ; res.d2 = (dia_t)t . right ; 

res. left = *(tree_t *)t.left; 
res. right = *(tree_t *)t. right; 

} 

return res; 

} 

Notice that we must pay two Os in order to build a tree node. In exchange, two 
Os become available when we decompose a tree. 

To implement breadth we have to define a type listtree_t of lists of trees 
analogous to list_t with int replaced by tree_t. Of course, the associated 
helper functions need to get distinct names such as niltreeO, etc. 

We can then define a function br_aux with prototype 

list_t br_aux(listtree_t 1) 

by essentially mimicking the functional definition above (the complete code is omitted 
for lack of space) and obtain the desired function breadth as 

list_t breadth(dia_t d, tree_t t) { 
return br_aux(cons (d,t ,nil () ) ) ; 

} 

Notice that the type of breadth shows that the result requires one memory region more 
than the input. 

All these functions do not use dynamic memory allocation because the heap space 
needed to store the result can be taken from the argument. To construct concrete lists 
in the first place we need of course dynamic memory allocation. The full paper shows 
how this can be accommodated in a controlled fashion. Of course, for these programs 
to be correct it is crucial that we do not overwrite heap space which is still in use. The 
main message of this paper is that this can be guaranteed systematically by adhering 
to a linear typing discipline. 

In other words, a function must use its argument at most once. 

For instance, the following code which attempts to double the size of its argument 
would be incorrect: 

list_t twice (list_t 10) { 

list_destr_t 1 = list_destr (10) ; 

return l.kind==NIL ? nil() 

: cons (1 . d,0 , (cons (1 . d, 0,twice (1 . tl) ) ) ) ; 

} 

Rather than returning a list of O’s twice the size of its input it returns a circular list! 
A similar effect happens, if we replace the last line of the code for insert () by 

cons (d, 1 .hd, insert (d, a, 1 . tl) ) ; 

In each case the reason is the double usage of the O-values d and 1 . d. 
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3 A Linear Functional Programming Language 

We will now introduce a linearly typed functional metalanguage and translate it sy- 
stematically into C. This will be done with the following aims. First, it allows us to 
formally prove the correctness of the methodology sketched above, second it will relieve 
us from having to rewrite similar code many times. Suppose, for instance, you wanted 
to use lists of trees (as needed to implement breadth first search). Then all the basic 
list code (list_t, nilO , consO, etc. ) will have to be rewritten (this problem could 
presumably also be overcome through the use of C-|— I- templates [13]). Thirdly, a for- 
malised language with linear type system will allow us to enforce the usage restrictions 
on which the correctness of the above code relies. Finally, this will open up the possi- 
bility to extend the metalanguage to a fully-fledged functional language which would 
be partly compiled into C whenever this is possible and executed in the traditional 
functional way when this is not the case. 



3.1 Syntax and Typing Rules 

The zero-order types are given by the following grammar. 

A ::= N I O I L(A) | T(A) | Ai (g) A 2 



More type formers such as sum types, records, and variants can easily be added. 

A first-order type is an expression of the form T = (Ai , . . . , A„)^B where Ai . . . An 
and B are zero-order types. 

A signature U is a partial function from identihers (thought of as function symbols) 
to first-order types. 

A typing context F is a finite function from identifiers (thought of as parameters) 
to zero order types; if a; 0 dom(F) then we write B, x:A for the extension of B with 
X A. More generally, if dom(F) n dom(Z\) = 0 then we write B, A for the disjoint 
union of B and A. If such notation appears in the premise of a rule below it is implicitly 
understood that these disjointness conditions are met. 

Types not including L(— ), T(— ), O are called heap-free, e.g. N and N (g) N are heap- 
free. 

Let 17 be a signature. The typing judgement B \~e e : A read “expression e has 
type A in typing context B and signature E” is defined by the following rules. 



X G dom(F) 
Bhsx: B{x) 



(Var) 



E{f) = (Ai, . . . , An)^B Bi\- s Ci \ Ailor i = 1 . . .n 
Bi,...,Bn \~s /(ei,...,e„) : B 



(SiG) 



B,x:A,y.A \~s e : B A heap-free 
B, x:A \~s e[x/y] : B 



(Contr) 



c a C integer constant 
B\-sc:N 



(Const) 



F hi; ei : N 



A\~s C 2 '■ N *aC infix opn. 
F, Zi h ei * 62 : N 



(Infix) 
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_r’l“2;e:N hs : A /W~ ^ e!' \ A 
r, A \~s if e then e' else e" : A 

r \~E e ■ A A\~e s' ■ B 
r,A\-se(»e':A(3B 

r \-s s : AiSi B A, x:A, y.B hi; s' : C 
r, A \~s match e with x (8> y^s' : C 

rhs niu : L(H) 

Bd hi; Sd ■ '0> Bh \~E Sh ■ A Bt \~E St : L(H) 

Bd,Bh,Bt \~E cons{sd,Sh,St) : L(H) 

r hi; e : L(H) 

A \~E finil : B 

A,d:0,h:A,f-\-{A) \~e Ccons : B 
B,A\~e match e with nil=>enii|cons(d, fe, t)=>econs : B 

B^es-.A 
B \-E leaf(e) : T{A) 

Bdl hi; Sdl : O Bd2 hi; Sd2 ■ ^ Ba \~E Sa ’■ A 
Bi hi; Si : T(H) Br hi; e. : J{A) 

Bdl,Bd2,Ba,Bl,Br \~E 'node{Sdl , Sd2 , Sa , Si , Sr) '■ T( A) 

B hi; e : T(hl) A, a:A \~e eieaf : B 
A,di:<>,d 2 -.<>,a:A,l:T{A),r:T{A) \~e s„ode ■ B 

B,A \~E match e with leaf(o)=t>eieaf|node(di, ci 2 , a, t)=^enode : B 

(Tree-Elim) 

Remarks The symbol ★ in rule Infix ranges over a set of binary infix operations such 
as + , - , / , * , <= , == , ... We may include more such operations and also other 
base types such as floating point numbers or characters. 

As usual, we omit type annotations wherever possible. The constructs involving 
match bind variables. 

Application of function symbols or operations to their operands is linear in the 
sense that several operands must in general not share common free variables. This is 
because of the implicit side condition on juxtaposition of contexts mentioned above. In 
view of rule Contr, however, variables of a heap-free type may be shared and moreover 
the same free variable may appear in different branches of a case distinction as follows 
e.g. from the form of rule If. Here is how we typecheck x + x when a;:N. First, we have 
a;:N h ® : N and y:N h y : N by Var. Then a;:N,j/:N h x+y : N by Infix and finally 
a;:N h x+x : N by rule Contr. It follows by standard type-theoretic techniques that 
typechecking for this system is decidable in linear time. 

Programs A program consists of a signature E and for each symbol 

/ : (Ai, . . . , An)^B 

contained in 17 a term 



(If) 

(Pair) 

(Split) 

(Nil) 

(Cons) 

(List-Elim) 

(Leaf) 

(Node) 



X\\Al, . . . , Xn'-An \~E Sf : B 
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3.2 Set-Theoretic Interpretation 

In order to specify the purely functional meaning of programs we introduce a set- 
theoretic interpretation as follows: types are interpreted as sets by 



[N1 = Z 

lOl = {0} 

|[L(A)] = finite lists over [[AJ 
l[T(A)| = binary Ml-labelled trees 
= [B1 

To each program {S, (e/)/gdom(i;)) we can now associate a mapping p such that 
p(/) is a partial function from [AiJ x . . . [AnJ to |[-B| for each / : (Ai, . . . , An)^B. 

This meaning is given in the standard fashion as the least fixpoint of an appropriate 
compositionally defined operator: 

A valuation of a context T is a function rj such that rj{x) € |[-r(a:)]] for each x € 
dom(_r); a valuation of a signature U is a function p such that p{f) € |[X'(/)| whenever 
/ G dom(A'). It is valid if it interprets the constructors and destructors for lists and 
trees by the eponymous set-theoretic operations 

To each expression e such that F hi; e : A we assign an element |e]^ ^ G |[A| U {T} 
in the obvious way, i.e. function symbols and variables are interpreted according to the 
valuations; basic functions and expression formers are interpreted by the eponymous 
set-theoretic operations, ignoring the arguments of type O in the case of construc- 
tor functions. The formal definition of ^ is by induction on terms. A program 
{S, (e/)/gdom(i;)) is interpreted as the least valuation p such that 

p{f){vi,...,Vn) = 



where ri{xi) = Vi. 

We stress that this set-theoretic semantics does not say anything about space usage. 
Its only purpose is to pin down the functional denotations of programs so that we can 
formally state what it means to implement a function. Accordingly, the resource type 
is interpreted as a singleton set and ® product is interpreted as cartesian product. 

It will be our task to show that the malloc ()-free interpretation of our language is 
faithful with respect to the set-theoretic semantics. Once this is done, the user of the 
language can think entirely in terms of the semantics as far as extensional verification 
and development of programs is concerned. In addition, he or she can benefit from the 
resource bounds obtained from the interpretation but need not worry about how these 
are guaranteed. 



3.3 Examples 

Reverse: 



rev_aux : (L(N), L(N))— >L(N) 
reverse : (L(N))— >-L(N) 

6rev_aux(^, acc) ~ match I with 
nil^acc 

|cons(d, h, t)^rev_aux(t, cons(ci, h, acc)) 
6reverse(0 = rev_aux(f, nilf,!) 
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Insertion sort 

insert : (O, N, L(N))— >L(N) 
sort : (L(N))— >L(N) 

Cinsert (d, a, 0 = rpatch I with 
nil^nil 

|cons(d', b, i)^if a < h 

then cons(d, a, cons(d', 6, Z)) 
else cons(d, b, insert(d', b, 1)) 
esort(0 = match I with 
nil^nil 

|cons(d, a, Z)^insert(d, a, sort(Z)) 

Breadth-first search 

snoc: (0,L(T(N)),T(N))->L(T(N)) 
breadth. : (L(T(N)))— >-L(N) 
esnoc(d, = match I with 
nil^cons(d, t, nil()) 

|cons(d', t' , g)^cons(d', t' , snoc(d, q, t)) 
ebreadth(g) = match q with 
nil^nil 

|cons(d, t, g) = match t with 
leaf(a)^cons(d, a, breadth(g)) 
node(di, d 2 , a, I, r)^cons(d, a, 

breadth(snoc(d 2 , snoc(di, q, 1), r))) 

Other examples we have tried out include quicksort, treesort, and the Huffman algo- 
rithm. 

Remark 31 It can be shown that all definable functions are non- size-increasing, e.g., 
if f : (L(N))— >-L(N) then, semantically, \f{l)\ < |^|. This would not be the case if we 
would omit the O argument in cons, even if we keep linearity. We would then, for 
example, have the function f{l) = cons(0, 1) which increases the length. The presence 
of such a function in the body of a recursive definition gives rise to arbitrarily long 
lists. 

3.4 Compilation into C 

By following the pattern of the examples in the introduction it is possible to associate 
a piece of C-code to each program P = {E, (e/)/gdom(i;)) in such a way that 

1. To each zero-order type A occurring in P a unique C identifier n(A) is associated 
and contains an appropriate type definition of this identifier along with appro- 
priately typed helper functions, e.g. iz(H)_cons, i/(H)_list_destr when A — L(. . . ). 

2. For each function symbol / : (Hi, . . . , An)^B defined in P the code contains 
a corresponding definition [[/|'^ of a function / with prototype 

v{B) /(jz(Hi) xi, .... *„) 

3. Whenever P \~e e : A then we can exhibit a C expression |[e|^ of type v{A) and 
involving the identifiers in P and in E. 

The details of this translation are omitted for lack of space; its gist is, however, con- 
tained in the examples from the introduction. 
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3.5 Correctness of the Translation 

We now have to show that the translation of a program P computes the partial 
functions defined by the set-theoretic interpretation p of P. Since we have not given 
all details of the translation we must content ourselves with a sketch of the correctness 
theorem and its proof which should hopefully allow the inclined reader to reconstruct 
it in full. 

For each zero-order type A we define the set V{A) as the set of pairs {v, H) where 
n is a C-stack-value of type i^{A) (under the type definitions and 77 is a region 

in the heap (a set of addresses). 

For example, an element of V(L(N)) consists of a stack-value of 

typedef struct Inode { 

kind_t kind;int hd; struct Inode * tl; 

} list_t; 

i.e., a triple v = (k,h,t) where k,h are (4 byte) integers and t is a memory address 
together with a set 77 of memory addresses. This set of memory addresses is meant to, 
but at this point not required to, comprise all addresses reachable from t by iterated 
dereferencing. 

Next, we inductively define a relation V{A) x |[A| which singles out the values 
which “implement” or “correspond to” a given semantic value. 

— (n,0) Un n' , if n encodes n' 

— {p, 77) Iho 0, if 77 is a contiguous region of size maxjsizeof (,v{A)) \ A occurs in P} 
and p points to the beginning of 77. 

— (v, 77) lhA(giB (a, 6) if 77 = 77i U 772 and u . f st, 77i Ih^ a and v . snd, 772 lbs b. 

— (v, 0) IbL(A) nil if V. kind = NIL. 

— (n, 77) ll“L(A) cons(/i, 7), if n.kind = CONS and 77 = Hd U Hh U Ht and {v .tl, Hd) 
Iho 0 and (n.hd, 77t) Ua h and {v.tl, Ht) IbL(A) t, 

— (n, 77) Ibx(A) leaf(a) if n.kind = LEAF and (n. label, 77) Fa a, 

— (n, 77) Ibx(A) node(a, Z,r) if n.kind = NODE and 77 = Hdi U Hd 2 U Ha U 77; U Hr 

and (n . left, 77di) Iho 0 and (n. right, 77^2) Ibo 0 and (n . label, 77a) Ua a and 
(w. left, 77;) Ibx(A) i and (n . right, 77^) II-x(a) r 

Here 77 = 77i U H 2 means that 77 = Hi U H 2 and Hi n H 2 = 0. 

Notice that whenever A is heap-free and (v, 77) Fa a for some a then 77 = 0. 

Theorem 32 Assume the following: 

— a program P = (U, (e/)/gdom(i;)), 

— a well typed expression P \~e e : A, 

— for each x € P a value {vx, Hx) G V{P{x)) such that Hx n77y = 0 whenever x ^ y, 

— a mapping rj such that (vx,Hx) lbs(a;) ??(x) for each x G dom(F), 

Let p be the set-theoretic interpretation of P. 

Then the evaluation of x„i-^xn] ® runtime environment which maps 

X G dom(F) to Vx will result in a value v such that (n, 77) Fa |e|,j ^ for some subset 
H G Uo;edom(r) moreover the part of the heap outside o/Ua,gdom(r) 

left unaffected by the evaluation. 

Proof. Straightforward lexicographic induction on evaluation time and length of typing 
derivations. Details are omitted for lack of space. 

It follows by specialising to the defining expressions e/ that a program computes its 
set-theoretic interpretation. 
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4 Extensions 

Dynamic allocation As it stands there is no way to create a value of type O, so in 
particular, it is not possible to create a non-nil constant of list type. The examples show 
that this is often not needed. Sometimes, however, dynamic allocation and deallocation 
may be required and to this end we can introduce functions new : ()— and disp ; 
(O)— >N. The full paper explains how these are translated and used. 

Polymorphism, higher-order functions We can extend the language with polymor- 
phism (with two kinds of type variables ranging over zero- and first order types) and 
higher-order functions, both linear and nonlinear. Recursive functions would then be 
defined using a single constant 

rec : VX.!(!X ^ X) 

where X ranges over first-order types. The full paper contains a more detailed discus- 
sion of this point. 

Queues The program for breadth-first search could be made more efficient using queues 
with constant time enqueuing. We can easily add a type former Q(A) (and appropriate 
term formers) which gets translated into linked lists with a pointer to their end. The 
correctness proof carries over with only minor changes. 

Tail recursion The type system does not impose any restriction on the size of the 
stack. If a bounded stack size is desired, all we need to do is restrict to a tail recursive 
fragment and translate the latter into iteration. 

More challenging would be some automatic program transformation which transla- 
tes the existing definition of breadth and similar functions into iterative code. To what 
extent this can be done systematically remains to be seen. It seems that at least for 
linear recursion (only one recursive call) such transformation might always be possible 
using continuations. 

Expressivity In order to study complexity-theoretic expressivity it seems to be a rea- 
sonable abstraction to view the type N as finite, e.g. the set of 32 bit words, and to 
view the heap as infinite. In this case, we have the following expressivity result: 

Theorem 41 // / : N ^ N is a non-increasing function computable in linear (in 
log(n)) space then there exists a program containing a symbol f : (L(N))— >-L(N) such 
that |f](M(r)) = u(f(x)) when m : N — >■ {0,1}* is an encoding of natural numbers as 
lists of Os and Is. 

Proof. If f{n) is computable in space clog(n) then we use the type T = L(N ® ® 

N) with c factors to store memory configurations. We obtain / by iterating a one- 
step function of type (T)— >r and composing with an initialisation function of type 
(L(N))— and an output extraction function of type (T)— >-L(N) all of which are readily 
seen to be implementable in our system. 

If we restrict to a tail recursive fragment then programs can also be evaluated in linear 
space so that we obtain a characterisation of linear space. 
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Recursive types We can extend the type system and the compilation technique to 
arbitrary (even nested) hrst-order recursive types. To that end, we introduce (zero 
order) type variables and a new type former yX.A which binds X in A. Elements of 
fiX.A would be introduced and eliminated using fold and unfold constructs 



rhse: A[{0(g, yX.A)/X] 
r \-s fold(e) : yX.A 



(Fold) 



r\-se: yX.A 

r unfold(e) : A[(0 (g) yX.A)/X] 



(Unfold) 



. This together with coproduct and unit types allows us to define lists and trees as 
recursive datatypes. Notice that this encoding would also charge two Os for a tree 
constructor. 



5 Conclusion 

We have defined a linearly typed first-order language which gives the user explicit 
control over heap space in the form of a resource type. 

A translation of this system into malloc ()-free C is given which in the case of simple 
examples such as list reversal and quicksort generates the usual textbook solutions with 
in-place update. 

We have shown the correctness of this compilation with respect to a standard set- 
theoretic semantics which disregards linearity and the resource type and demonstrated 
the applicability by a range of small examples. 

The main selling points of the approach are 

1. that it achieves in place update of heap allocated data structures while retaining 
the possibility of equational reasoning and induction for the verification and 

2. that it generates code which is guaranteed to run in a heap of statically determined 
size. 

This latter point should make the system interesting for applications where resources 
are limited, e.g. computation over the Internet, proof-carrying code, and embedded 
systems. Of course further work, in particular an integration with a fully-fledged fun- 
ctional language and the possibility of allocating a fixed amount of extra heap space 
will be required. Notice, however, that this latter effect can already be simulated by 
using input of the form L(0 ® A) as opposed to L(A). 

Also, a type inference system relieving the user from having to explicitly move 
around the O-resource might be helpful although the present system has the advan- 
tage of showing the user in an abstract and understandable way where space is being 
consumed. And perhaps some programmers might even enjoy spending and receiving 
Os. 

6 Related Work 

While the idea of translating linearly typed functional code directly into C seems to be 
new there exist a number of related approaches aimed at controlling the space usage 
of functional programs. 
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Tofte-Talpin’s region calculus [19] tries to minimise garbage collection by dividing 
the heap into a list of regions which are allocated and deallocated according to a stack 
discipline. A type systems ensures that the deallocation of a region does not destroy 
data which is still needed; an inference system [20] generates the required annotations 
automatically for raw ML code. 

The difference to the present work is not so much the inference mechanism (see 
above) but the fact that even with regions the required heap size is potentially un- 
bounded whereas the present system guarantees that the heap will not grow. Also in 
place update does not take place. 

Hughes and Pareto’s system of sized types annotates list types with their length, 
e.g. the reversal function would get type Vn.L„(A) L„(A). While this system allows 
one to estimate the required heap and stack size it does not perform in place update 
either (and cannot due to the absence of linear types) . 

In a similar vein Crary and Weirich [7] have given a type system which allows one to 
formalise and certify informal reasoning about time consumption of recursive programs 
involving lists and trees. Their language is a standard one and no optimisation due to 
heap space reuse is taken into account. 

The relationship between linear types and garbage collection has been recognised 
as early as ’87 by Lafont [14], see also [10,1,21, 16]. But again, due to the absence of 
O-types, these systems do not provide in place update but merely deallocate a linear 
argument immediately after its use. 

This effect, however, is already achieved by traditional reference counting which 
may be the reason why linear functional programming hasn’t really got off the ground, 
see also [6] . While the runtime advantages of the present approach might also be realised 
through reference counting (and indeed seem to be by the Ocamlopt compiler) the 
distinctive novelty lies in the fact that one can guarantee bounded heap size and obtain 
a simple C program realising it which can be run on any machine or system supporting 
C. 

The type system itself is very similar to the system described by the author in ]9] 
which in turn was inspired by Caseiro’s analysis of recursive equations [5] and bears 
some remote similarity with Bounded Linear Logic [8] 

Mention should also be made of Baker’s Linear LISP [2,3] which bears some si- 
milarity to our language. It does not contain the resource type O or a comparable 
feature, thus it is not clear how the size of intermediate data structures is limited, cf. 
Remark 31. Similar ideas, without explicit mention of linearity are also contained in 
Mycroft’s thesis [17] 

Other related approaches are uniqueness types in Clean [4], linear ADTs and mo- 
nads [11] which will be compared in the full paper. 

In a seminar talk in Edinburgh, John Reynolds has reported about ongoing work 
on using linear types for in-place update. At the time of writing there was no conclusive 
result, though and his attention seems to have since shifted to using linear types for 
reasoning about shared heap allocated data structures. This together with a medium 
depth literature research leads me to believe that the present article is in fact the hrst 
to successfully apply linear types to the problem of functional in-place update. 



Acknowledgement I would like to thank Samson Abramsky for helpful comments and 
encouragements. Thanks are also due to Peter Selinger for spotting a shortcoming in 
an earlier version of this paper. 
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Abstract. We propose a new type discipline for the vr-calculus in which 
secure information flow is guaranteed by static type checking. Secrecy 
levels are assigned to channels and are controlled by subtyping. A be- 
havioural notion of types capturing causality of actions plays an essen- 
tial role for ensuring safe information flow in diverse interactive beha- 
viours, making the calculus powerful enough to embed known calculi for 
type-based security. The paper introduces the core part of the calculus, 
presents its basic syntactic properties, and illustrates its use as a tool 
for programming language analysis by a sound embedding of a secure 
multi-threaded imperative calculus of Volpano and Smith. The embed- 
ding leads to a practically meaningful extension of their original type 
discipline. 



1 Introduction 

In present-day computing environments, a user often employs programs which 
are sent or fetched from different sites to achieve her/his goals, either priva- 
tely or in an organisation. Such programs may be run as a code to do a simple 
calculation task or as interactive parallel programs doing 10 operations or com- 
munications, and sometimes deal with secret information, such as private data 
of the user or classified data of the organisation. Similar situations may occur 
in any computing environments where multiple users share common computing 
resources. One of the basic concerns in such a context is to ensure programs do 
not leak sensitive data to the third party, either maliciously or inadvertently. 
This is one of the key aspects of the security concerns, which is often called 
secrecy. Since it is difficult to dynamically check secrecy at run-time, it may as 
well be verified statically, i.e. from a program text alone [7]. The information 
flow analysis [7,11,25] addresses this concern by clarifying conditions when flow 
of information in a program is safe (i.e. high-level information never flows into 
low-level channels). Recent studies [2,35,33] have shown how we can integrate 
the techniques of type inference in programming languages with the ideas of in- 
formation flow analysis, accumulating the basic principles of compositional static 
verification for secure information flow. 

The study of type-based secrecy so far has been done in the context of fun- 
ctional or imperative calculi that incorporate secrecy. Considering that concur- 
rency and communication are a norm in modern programming environments, 
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one may wonder whether a similar study is possible in the framework of process 
calculi. There are two technical reasons why such an endeavour can be inte- 
resting. First, process calculi have been accumulating mathematically rigorous 
techniques to reason about computation based on communicating processes. In 
particular, given that an equivalence on program phrases plays a basic role for 
semantic justification of a type discipline for secrecy [35], the theories of be- 
havioural equivalences [17,20,26,28], which are a cornerstone in the study of 
process calculi, would offer a semantic basis for safe information flow in com- 
municating processes. Second, type disciplines for communicating processes are 
widely studied recently, especially in the context of name passing process cal- 
culi such as the 7r-calculus, e.g. [6,15,20,28,32,36]. Further, recent studies have 
shown that name passing calculi enjoy great descriptive power, uniformly repre- 
senting diverse language constructs as name passing processes, including those 
of sequential, concurrent, imperative, functional and object-oriented languages. 
Since many real-life programming languages are equipped with diverse constructs 
from different programming paradigms, it would be interesting to see whether we 
can obtain a typed calculus based on name passing in which information flow in- 
volving various language constructs are analysable on a uniform syntactic basis. 

Against these backgrounds, the present work introduces a typed 7r-calculus 
in which secure information flow is guaranteed by static typing. Secrecy levels 
are attached to channels, and a simple subtyping ensures that interaction is 
always secrecy-safe. Information flow in this context arises as transformation 
of interactive behaviour to another interactive behaviour. Thus the essence of 
secure information flow becomes that a low-level interaction never depends on 
a high-level (or incompatible-level) interaction. Interestingly, this interaction- 
based principle of secure information flow strongly depends on the given type 
structures as prerequisites: that is, even semantically, certain behaviours can 
become either secure or insecure according to the given types. This is because 
types restrict a possible set of behaviours (which act as information in the present 
context), thus affecting the notion of safe information flow itself. For this reason, 
a strong type discipline for name passing processes for linear and deadlock-free 
interaction [6,20,36] plays a fundamental role in the present typed calculus, by 
which we can capture safety of information flow in a wide range of computational 
behaviours, including those of diverse language constructs. This expressiveness 
can be used to embed and analyse typed programming languages for secure 
information flow. In this paper we explore the use of the calculus in this direction 
through a sound embedding of a secure multi-threaded imperative calculus of 
Volpano and Smith [33] . The embedding offers an analysis of the original system 
in which the underlying observable scenario is made explicit and is elucidated 
by typed process representation. As a result, we obtain a practically meaningful 
extension of [33] with enlarged typability. We believe this example suggests a 
general use of the proposed framework, given the fundamental importance of 
the notion of observables in the analysis of secure computing systems [25,33,34]. 

Technically speaking, our work follows, on the one hand, Abadi’s work on 
type-based secrecy in the 7r-calculus [1] and the studies on secure information 
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flow in CCS and CSP [8,24,29,31], and, on the other, the preceding works on 
type disciplines for name passing processes. In comparison with [1], the main 
novelty of the present typing system is that it ensures safety of information flow 
for general process behaviours rather than that for ground values, which is often 
essential for the embedding of securely typed programming languages. Compared 
to [8,24,31], a key difference lies in the fundamental role type information plays 
in the present system for defining and guaranteeing secrecy. Further, these works 
are not aimed at ensuring secrecy via static typing. Other notable works on the 
study of security using name passing processes include [3,5]. These works are 
not about information flow analysis, though they do address other aspects of 
secrecy. 

In the context of type disciplines for name passing processes, the full use of 
dualised and directed types (cf. §3), as well as their combination with causality- 
based dynamic types, is new, though the ideas are implicit in [4,10,14,20,36]. Our 
construction is based on graph-based types in [36] , incorporating the partial alge- 
bra of types from [15] (the basic idea of modalities used here and in [15] originally 
comes from linear logic [10]). The syntax of the present calculus is based on [32], 
among others branching and recursion. We use the synchronous version since it 
gives a much simpler typing system. The branching and recursion play an es- 
sential role in type discipline, as we shall discuss in § 3. The calculus is soundly 
embeddable into the asynchronous 7r-calculus (also called the j^-calculus [17]) 
by concise encoding [32] . The operational feasibility of branching and recursion 
is further studied in [9,23]. For non-deterministic secrecy in general, security 
literature offers many studies based on probabilistic non-interference, cf. [13]. 
The present calculus and its theory are introduced as a basic stratum for the 
study of secure information flow in typed name passing processes, focussing on 
a simpler realm of possibilistic settings. Incorporation of the probability distri- 
bution in behavioural equivalences [22] is an important subject of future study. 
Further discussions on related works, including comparisons with functional and 
imperative secure calculi, are given in the full version [16]. 

This paper offers a summary of key technical ideas and results, leaving the de- 
tailed theoretical development to the full version [16]. In the remainder. Section 2 
informally illustrates the basic ideas using examples. Section 3 introduces types, 
subtyping and the typing rules. Section 4 discusses key syntactic properties of 
typed terms. Finally Section 5 presents the embedding result and discusses how 
it suggests an extension of the original type discipline by Volpano and Smith. 

Acknowledgement. We deeply thank anonymous referees for their significant 
comments on an early version. Our thanks also go to Martin Berger, Gavin Lowe, 
Peter O’Hearn, Edmund Robinson and Pasquale Malacaria for their comments 
and discussions. 
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2 Basic Ideas 

2.1 A Simple Principle 

Let us consider how the notion of information flow arises in interacting processes, 
taking a simplest example. A CCS term a. 6.0 represents a behaviour which 
synchronises at a as input, then synchronises at 6 as output, and does nothing. 
Suppose we attach a secrecy level to each port, for example “High” to a and 
“Low” to b. Intuitively this means that we wish interaction at a to be secret, while 
interaction at 6 may be known by a wider public: any high-level security process 
may interact at a and 6, while a low-level security process can interact only at 6. 
Then this process represents insecure interactions: any process observing 6, which 
can be done by a low-level process, has the possibility to know an interaction 
at a, so information is indeed transmitted to a lower level from a higher level. 
Note that this does not depend on a being used for input and 6 used for output: 
a. 6.0 with the same assignment of secrecy levels is similarly unsafe. In both 
cases, we are saying that if there is a causal dependency from an action at a 
high-level channel to the one at a low-level channel, the behaviour is not safe 
from the viewpoint of information flow. Further, if we have value passing in 
addition, we would naturally take dependency in terms of communicated values 
into consideration. 

The above informal principle based on causal dependency^ is simple, but 
may look basic as a way of stipulating information flow for processes. Since 
many language constructs are known to be representable as interacting processes 
[18,19], one may wonder whether the above idea can be used for understanding 
safety in information flow in various programming languages. In the following, we 
consider this question by taking basic examples of information flow in imperative 
programs. 



2.2 Syntax 

Let a,b,c, . . .x,y, z, . . . range over names (which are both points of interaction 
and values to be communicated) , and A, Y, . . . over agent variables. We write 
y for a vector of names yo ' ' ' y-n-i with n > 0. Then the syntax for processes, 
written P, Q , R, . . . , is given by the following grammar. We note that this syntax 
extends the standard polyadic 7r-calculus with branching and recursion. These 
extensions play a fundamental role in the type discipline, in that intended types 
are hard to deduce if we use their encoding into, say, the polyadic 7r-calculus 
(see [16] for further discussions). 



P ::= x{y).P input 

I x{{vz)^.P output 
I x[{y).P Sc (z).Q] branching input 
I X ini ( (iv z)j^.P left selection 
I Tinr((i>' z)i/).P right selection 



P I Q parallel 

(i'x)P hiding 

0 inaction 

X{x) recursive variable 

{fiX{x).P){y) recursion 



^ Related ideas are studied in the context of CCS [8] and CSP [31]. 
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There are two kinds of inputs, one unary and another binary: the former is the 
standard input in the 7r-calculus, while the latter, the branching input, has two 
branches, waiting for one of them to be selected with associated communication 
[32]. Accordingly there are outputs with left and right selections, as well as the 
standard one. We require all vectors of names in round parenthesis are pairwise 
distinct, which act as binders. In the value part of an output (including selec- 
tions), say ((i>'z)j7), names in z should be such that {z\ C {y} ({x} is the set of 
names in x) , and the order of occurrences of names in z should be the same as 
the corresponding names in y. Here (i^z) indicate names z are new names and 
are exported by output, {{u ^y) is written (y) if z = 0, and {u z) if y = z. We 
often omit vectors of the length zero (for example, we write inr for inr( )) as 
well as the trailing 0 . The binding and a-convertibility =a are defined in the 
standard way. In a recursion {y,X{x).P){y) , we require that P is input guarded, 
that is P is either a unary input or a branching input, and free names in P are 
a subset of {x}. The reduction relation — > is defined in the standard manner, 
which we illustrate below (the formal definition is given in [16]). 

We illustrate the syntax by examples. First, the following agents represent 
boolean constants denoting the truth and the conditional selection (let c and y 
be fresh). 

T(6) = 6(c).(cinl I T(&)) and l{{x, P, Q) x{v> y).y[{).P&{).Q] 

The recursive definition of T (6) is a notational convention and actually stands 
for T(6) (/xA(6).5(c).(cinl | X{b))){b). The truth agent first inputs a name 

c via b, then, via c, does the left selection with no value passing as well as 
recreating the original agent. By replacing ini by inr, we can define the falsity. 
The conditional process invokes a boolean agent, then waits with two branches. 
If the other party is truth it generates P: if else it generates Q. We can now 
show how these two processes interact: 

li{x, P, g)|T(cr) {uy){y[{).Pk{).Q]\yinl\T{x)) P\T{x) 
Next we consider a representation of imperative variable as a process. 

Var(a;u) = x[(z).(z(w) | Var(xu)) & (u').Var(a;u')] 

In this representation, we label the main interaction point of the process (called 
principal port in Interaction Net [21]) by the name of the variable x. It has two 
branches, of which the left one corresponds to the “read” option, while the right 
one corresponds to the “write” option. If the “read” is selected and z is received, 
the process sends the current value u to z, while regenerating the original self. 
On the other hand, if the “write” branch is selected and v' is received, then 
the process regenerates itself with a new value v' . We can then consider the 
representation of the assignment “x := y,” which first “reads” the value from 
the variable y, then “writes” that value to the variable x. 

Assign(xy) y inl(i/ z).z(u).x inr(u) 




Secure Information Flow as Typed Process Behaviour 



185 



2.3 Imperative Information Flow in Process Representation 

(1) Causal Dependency. We can now turn to the information flow. We first 
consider the process representation of the following obviously insecure code [25] . 



Here the superscripts “l” and “h” indicate the secrecy levels of variables: thus y 
is a high (or secret) variable and a; is a low (or public) variable. This command is 
insecure intuitively because the content of a secret variable becomes visible to the 
public through x. Following the previous discussion, its process representation 
becomes: 

Assign(x^y^) inl(ia c).c^(u). inr(u). nteractional 

Note we are labeling channels by secrecy levels. We can easily see that this pro- 
cess violates the informal principle stipulated in §2.1, because its low-level beha- 
viour (at x) depends on its preceding high-level behaviour (at y, c). Thus this ex- 
ample does seem explainable from our general principle. Similarly, we can check 
the well-known example of implicit insecure flow “if then x^ := y^ end” 
(where the information stored in z can be indirectly revealed by reading x), is 
translated into insecure process interaction “ z^(u c).c^[().Assign(a;'^?/'^) & ().0j” . 
Here again the low-level interactions (in Assign(x’^y'^)) depend on the high-level 
interactions at z and c. 

(2) Deadlock-Freedom. So far there has been no difficulty in applying our 
general principle to process presentation of imperative information flow. Howe- 
ver there are subtleties to be understood, one of which arises in the following 
sequential composition. 

:= := 

The whole command is considered to be safe since whatever the content of x 
and y would be, they do not influence the content of z and w. However the 
following process representation of this command seems not safe in the light of 
our principle: 

y^ inl(i/ci).c 5 ^(wi).x^ inr(ui).uJ^ inl(u C2) -C^ (02) ■ inr(u2) (*) 

Here the behaviours at low-level ports {w and z) depend on, via prefixing, those 
at high-level ports (x and y) . Does this mean our principle and the standard idea 
in information flow are incompatible with each other? However, a closer look at 
the above representation reveals that this problematic dependency does not exist 
in effect, provided that the above process interacts with the processes for impe- 
rative variables given in §2.2. If we assume so, the actions at y and x (together 
with those at z and w) by the above process are always enabled: whenever a 
program wishes to access a variable, it always succeeds (in the i parlance, we 
are saying that interactions at these names are guaranteed to be deadlock- free) . 
Thus we can guarantee that, under the assumption, the action at say w above 
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will surely take place, which means the dependency as expressed in syntax does 
not exist. Observing there is no dependency at the level of communicated values 
between the two halves of (*), we can now conclude that the actions at w and z 
do not causally depend on the preceding actions at y and x. 

(3) Innocuous Interaction. We now move to another subtle example, using 
the following command. 

if then := end 

While this phrase is considered to be secrecy- wise safe [25], its representation in 
the TT-calculus becomes: 

z^(i/c^).c[().y'"inl(i/e).e'"(t)).x^ inr(ri) & ().0j (**) 

which again shows apparently unsafe dependency between the second action at 
c and the third action at y. In this example, the process does get information 
at c in the form of binary selection, even though c is deadlock- free. Moreover 
the output at y does not occur in the right branch, so the output depends on 
the action at c even observationally. But the preceding study [33,35] shows the 
original imperative behaviour is indeed safe. How can it be so? Simple, because 
this command only reads from y, without writing anything: so it is as if it did 
nothing to y. Returning to (**), we find the idea we made resort to in (2), is 
again effective: we consider this output action as not affecting the environment 
(hence not transmitting any information) provided that the behaviour of the 
environment is such that invoking its left branch has no real effect - in other 
words, if it behaves just as the imperative variable given in §2.2 does. We call 
such an output innocuous: thus, if we decide to ignore the effect of innocuous 
actions, there is no unsafe dependency from the high-level to the low-level (note 
the left branch as a whole now becomes high-level). We further observe that 
the insecure examples in (1) are still insecure even after incorporating deadlock- 
freedom and innocuousness. 

The preceding discussions suggest two things: first, we may be able to for- 
mally stipulate the interactional framework of safe information flow which may 
have wide applicability along the line of the informal notion given in §2.1. Se- 
condly, however, just for that purpose, we need a non-trivial notion of types for 
behaviours which in particular concerns not only the behaviour of the process 
but also that of the assumed environment. The formal development in the fol- 
lowing sections shows how these ideas can be materialised as a typed process 
calculus for safe information flow. 

3 A Typed 7r-Calculus for Secure Information Flow 

3.1 Overview 

In addition to names and agent variables (cf. §2.1), the typed calculus we in- 
troduce below uses a set of multiple secrecy levels, which are assumed to form 
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a lattice. s,s',. . . range over secrecy levels, and s < s' etc. denotes the partial 
order (where the lesser means the lower, i.e. more public). Using these data as 
base sets, our objective in this section is to introduce a typing system whose 
provable sequent has the following form: 

r \~s P > A a process P has an action type A under a base P with a secrecy level s 

We offer an overview of the four elements in the above sequent. 

(1) The base T is a finite function from names and agent variables to types and 
vectors of types, respectively. Intuitively a type assigned to a channel denotes the 
basic structure of possible interaction at that channel, for example input/output 
and branching/selection. We also include refined modalities for recursive inputs 
and their dual outputs, which indicate whether they involve state change or not. 

(2) The process P is an untyped term in §2.2 which is annotated with types 
in its bound names, e.g. a unary input becomes x{y:a).P (here and elsewhere 
we assume len{a) = len{y) where len{y) denotes the length of a vector, so that 
each yi is assigned a type aj). As one notable aspect, we only use those processes 
whose outputs (in any of three forms) are bound, e.g. each unary output has a 
form x{v y:a).P (this restricted output is an important mode of communication 
which arises in the context of both 7r-calculus [30] and games semantics [19,18]). 
Accordingly we set names in each vector instantiating agent variables to be pair- 
wise distinct. These restrictions make typing rules simpler, while giving enough 
descriptive power to serve our present purpose. 

(3) The secrecy index s guarantees that P under P only affects the environment 
at levels at s or higher: that is, it is only transmitting information (or tampering 
the environment) at levels no less than s. 

(4) The action type A gives abstraction of the causal dependency among (actions 
on) free channels in P, ensuring, among others, certain deadlock-free properties 
on its linear and recursive channels. The activation ordering is represented by a 
partial order on nodes whose typical form is per where p denotes a type of action 
to be done at x. There is a partial algebra over action types [15], by which we can 
control the composability of two action types (hence of typed processes which 
own them), thus enabling us to stipulate assumptions on the possible forms of 
the environments, cf. §2. 

3.2 Types and Subtyping 

We start with the set of action modes, denoted m, m' , ..., whose underlying ope- 
rational ideas are illustrated by the following table. 

4 non-linear (non-deterministic) input non-linear (non-deterministic) output 
4- truly linear input (truly once) truly linear output (truly once) 

! recursive input (always available) ? zero or more output (always enabled) 

The notations ! and ? come from Linear Logic [10], which first introduced these 
modalities. We also let k,k',..., called mutability indices, range over {t, p}. 
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(Well-formedness and Compatibility) 

- \- T t' \- T ^ t' n r'i h Ti X t' s > s' I- Ti X r' s > s' 



hr h (r, r') h r^ x r 
h Tij X Tij 


b (ri*' X (t')J b (f)i X (f')]', b X 

h Tij X Tij S > s' h Tij X r'j s > s' 


h [fi&T2]f X [r(©72]J 


b [ri&T2]i X [f(©T2] 




(Subtyping) 






VI 

h' 

_L 


\- Ti < Ti S > s' 


b Ti < Ti S > s' 


b (T)f < 


b (f)i < (f')t, 


b (T)i,. < 


VI 

h 

_L 


b Ti < r( s < s' 


b Ti < Ti S < s' 


b (T)J < (/)? 


b (f)I < (f')T, 


b {r)U < 


b Tij < Tij 


b Tij < Tij S> s' 


b Tij < Tij S > s' 



I- [ri&T2lf < [n&T2]f I- [fl&T2]i < I- 



1“ Tij < r'ij 



h Tij < T-j S < s' 



h Tij < r'ij s < s' 



I- [ti©T 2]J < [f(©f2]J I- [ti©T 2]I< [f(©f2]]', b [n©'?2]I,,^l©„,2 < [H©’^1 I',ki®K 2 
b ("TI, T2) I- r < ri or I- r < T2 h (r(, r2) h Ti < r' 
h r < (n, T2) h (n, T2) < (n, ra) 



Fig. 1. Subtyping 



Mutability indices indicate whether a recursive behaviour is stateful or not: for 
input, i denotes the lack of state, which we call innocence^ cf. [19], while fi means 
it may be stateful, that is it may change behaviour after invocation; for output, 
L denotes innocuousness, that is the inputting party is innocent, while fi deno- 
tes possible lack of innocuousness. Given these base sets, the grammar of types, 
denoted a,j3,..., are given by: 

a ::= r | (r, r') r ::= «i | «□ 

::= (^)f I (^)t I (7^)1, K I [n&rajf | [TiSzT2]j \ 
ao ::= (^)! I (^)I I I [ri©r2]! | [^©fzjj | [ti©T2]I,«i®«2 

Types of form (t, t') are pair types, indicating structures of interaction for both 
input and output, while others are single types, which are only for either input or 
output. We write md(a) for the set of action modes of the outermost type(s) in 
a, e.g. md((T)™) = {m} and rnd(((ri)™b ("^ 2 )^^)) = {^ 1 ,^ 2 }- We often write 
md(a) = m for md(a) = {m}. Similarly, we write sec(r) for the security level of 
the outermost type in r, e.g. sec((r)™) = s. We define the dual of m, written 
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fn, as: Jf = fi, f = Jl, i" =4,, 4^ =T> ! = ? and ? = !. Then the dual of a type a, 
denoted by a, is given by inductively dualising each action mode in a, as well 
as exchanging & and 0 . Among types, those with body (r) correspond to unary 
input/output, those with body [T 1 &T 2 ] correspond to branching input, and those 
with body [ti©T 2 ] correspond to output with selections. 

We say a is well-formed, written h a, if it is derivable from the rules in 
Figure 1, where we also define the compatihility relation x over single types. A 
pair type is well-formed iff its constituting single types are compatible. We also 
say a is a subtype of f3, denoted h a < /3, if this sequent is derivable by the rules 
in Figure 1. Some comments on types, subtyping and compatibility follow. 

Remark 1. (nested types) Nested types denote what the process would do 
after exporting or importing new channels (hence covariance of subtyping on 
nested types): as an example, neglecting the secrecy and mutability, x : (()'*')”'' 
denotes the behaviour of doing a truly linear output at x exporting one single 
new name, and at that name doing a truly linear input without importing any 
name. 

(secrecy levels, compatibility and subtyping) Since safe information ffow 
should never go from a higher level to a lower level, a rule of thumb is that 
two types are compatible if such a ffow is impossible. Thus, because a ffow can 
occur in both ways at non-deterministic channels (cf. § 2 . 1 ), two non-linear types 
can be related only when they have the same secrecy level. On the other hand, 
for compatibility of linear types, we require that the inputting side is higher 
than the outputting side in secrecy levels, since the ffow never comes from the 
inputting party (further, in truly linear unary types, even the outputting party 
does not induce ffow). Accordingly, the subtyping is covariant for output and 
contravariant for input with respect to secrecy levels. 

(mutability index) As we explained already, the index l represents the re- 
cursive input behaviour without state change (innocence) or, dually, the output 
which does not tamper the corresponding recursive processes (innocuousness). 
Note an index is only meaningful for recursive behaviours and their dual output. 
Naturally we stipulate that an innocent input can only be compatible with an 
innocuous output; and an innocent input can only be a subtype of an innocent 
input, and an innocuous output can only be a subtype of an innocuous output. 

3.3 Action Types 

An action type A is a finite poset whose elements, called action nodes, are given 
by the following grammar. 

n ::= 4'^; | I 4^a: | !x | ?x | ?''x | \ X{x). 

41 X indicates x is already used exactly once for both input and output. ?'"a; 
indicates that all actions occurring at x so far are innocuous. X{x) (with len(x) > 

1 always) indicates the point to which the behaviour recurs. indicates possibility 
of nonlinear (nondeterministic) input and output. Other symbols are already 
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explained in the table in §3.2. As an illustration of causality, write n — >■ n' when 
n' is strictly bigger than n without any intermediate element. Then 4- x —>-4^ y 
says that a truly linear output at y becomes active just after a truly linear input 
at X. 

We only use those action types which conform to a well-formedness condition 
that in particular includes linearity (for details see [16]). In the typing rules, we 
use the following abbreviations for action types (let {xi\ be free names in A). 

4_i'A A only contains \,Xi or j'Xi A~^ x does not occur in A 

?A A only contains ?Xi, l''Xi or -i^Xi A® B disjoint union, with AC\ B = % 

l‘'A A only contains 7''Xi pa; poa;o 0 Pia;i • • • p„_ix„_i (n > 0) 

We also say x is active in A if pa; (for some p) is minimal in A. 

3.4 Typing System 

We now introduce the main typing rules with illustration. We use the following 
notation: given a base B, (1) x : a (resp. X : a) denotes B{x) = a (resp. 
r{X) = a); and (2) B ■ A denotes the disjoint union of two bases, assuming 
their domains do not intersect. Henceforth we assume all types and bases are 
well-formed. We start from the typing rules for basic process operators: the 
inaction, parallel composition and hiding. 

(Zero) (Psr) Ai x A 2 (Res) 

B\-sPi>Ai (i=l,2) B ■ X : a\~s P> A'^px pG{4l, !,$} 

T hg 0 > 0 T Pi I P 2 Ai 0 A 2 P bs (la a; : a)P > A 

In (Par), we use coherence Ai x A 2 and composition Ai 0 A 2 , both following 
[36]. Essentially speaking, Ai x A 2 says Ai and A 2 are composable without 
violating linearity or causing vicious circles; then Ai 0 A 2 is the result of the 
composition. See [16] for details. In (Res), we do not allow a name with a mode 
in { 4 ,, t, ?)?'’} to be restricted since these actions expect their complementary 
actions to get composed — in other words, actions with these types assume 
the existence of actions with their dual types in the environment. With the 
complementary actions left uncomposed, the hiding leads to an insecure system. 
In addition, we have the weakening rules for lx, l''x, 4l x and ^Ja;, and the 
degradation rule in which B \-g P > A is degraded into P hg' P > A when s' < s 
(cf. § 3.1 (3)). 

We next turn to non-liner prefix rules. The rules for prefix actually control 
the secrecy levels of each action. 

(In) h (r)^ < P(a;) (Out) h (r)J < B{x) 

P-y:rl-sP>p^0?A0tta; P-y:Tl-sP[>p^0?A0tta; 

P hg x{y:T).Pt> A <Sitx -T hg x{uy:a).P > A 0 tta; 

Since the subtyping on non-linear types is trivial with respect to their secrecy 
levels, h(r)^’* < B{x) means B{x) has precisely the level s. Thus, in both rules. 
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the initial action at level s is followed by actions affecting the same or higher 
levels (because P is typed with s). Note also all abstracted actions above) 
should be active, which is essential for the subject reduction. Non-linear prefix 
rules for branching and selections are essentially the same. 

Among linear prefix rules, the following shows a stark contrast with the non- 
linear (In) and (Out) rules. 



(Inb) {where C/y =^B) 

^ (r)t. < P{x) 
r-y-.Ths P>?A(g)C'-"= 

r \~s x{y:f).P 0 Ao 



(Out'l) (where C/y =4'1'5) 

^ (r)I, < P{x) 

P-y-.Ths P\>1A®C~^ 

P hs x{v y\T).P > A® ^x^B 



The notation C/y denotes the result of taking off nodes with names among y, as 
well as stipulating the condition that each yi should be active in C. We observe 
that the “true linearity” in these and later rules is stronger than those studied in 
[15,20], which only requires “no more than once” . In the rule, since s' is not given 
any condition in the antecedent, both rules completely neglect the secrecy level 
of X in P, saying we may not regard these actions as either receiving or giving 
information from/to the environment. The operation n-^B, which is given in 
[16] following [36], records the causality. 

The next rules show that branching/selection need a different treatment from 
the unary cases when types are truly linear. Intuitively, the act of selection gives 
rise to a non-trivial flow of information. 



(Bra'*') (where Ci/j7i =TfB) 

b [ri&r2]i < r{x) 

r ■ Vi'-Ti bs Pi > ?A (g) C~A (i = 1, 2) 
r \~s x[{yi'.Ti)-Pi & (j/2 :t2).P2] 0 A® ^x^B 



(Sel)*) (where C/j7i =TfP) 

b [ri©T 2 ] 1 ' < r{x) 
r-yi-.A b. P>?A(g)C'"" 

r bs x±n\{vy\ :ri).P > A(g) '\x^B 



Here the subtyping is used non-trivially: in (Bra'*'), the real level of cc in P is 
the same or lower than s, so the level elevates. In (Sel'f, the real level of x is the 
same or higher, so the level may go down, but it is recorded in the conclusion. It 
is notable that this inference crucially depends on the employment of branching 
as a syntactic construct: without it, these rules should have the same strict 
conditions as non-linear prefixes. 

The final class of rules show the treatment of !-? modalities and mutability 
indices, dealing with recursive inputs and their dual outputs, and are most in- 
volved. We first have the variable introduction rule (Var'), in which we derive 
P ■ X : d \~s X{x) > X{x) when we have both b < P{xi) and md(a;o) = !> 
as well as (for consistency with repetitive invocation) md(ai) G {?,j|,fr} {i A 0). 
Here we give no restriction on s since when the introduced variable is later bo- 
und, all potential tampering at free names would have been recorded except the 
subject of this recursion, the latter not being tampering. Below we introduce 
linear recursion rules, for which there are two pairs, one for unary prefix and 
another for binary prefix. We show the rules for unary input/output. 
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(In’) (Our) (where C/i/ =4t-B) 

^(r)U<r(zo) ha, <r(^o ^{f)l^^<r{x) pe{?,?^ 

(r{x/^-y:T-X :d b P > ^®?''H{®/5'}(g)Y(f) (« = t) F ■ y:r b P > ?H(g)C'(g)pa; 
\r{x/z}-y:T-X :a hs P > p^^?A{x/ z}^X (xw){k — /i) K = fi=> {s = s' Ap~?) 

r\-s {y,X{x-.d).xo{y.T).P){z}>\zo® A F'^ s>x{v y\T).PA®B®Y’X 

In (In’), we check that the process is immediately recurring to precisely the same 
behaviour (X(ai)) if it is innocent, or, if it is not innocent, it recurs to the same 
subject {X{xQWj)). The process can only do free actions with ?''-modes in the 
innocent case in addition to the recurrence (except at y, which are immediately 
abstracted), so that the process is stateless in its entire visible actions. In the 
conclusion, the new subject zq is introduced with the mode !. In the dual (Out'), 
if the prefix is an innocuous output {n = l), there is no condition on the level of x 
(s'), so that the level is not counted either in the antecedent or in the conclusion 
(e.g. even if s' = _L we can have s yf _L): we are regarding the action as not 
affecting, and not being affected by, the environment. However if the action is 
not innocuous {k = fx), it is considered as affecting the environment, so that we 
record its secrecy level by requiring s' = s. Note that, even if it is unary, a ?-mode 
output action may indeed affect the environment simply because such an action 
may or may not exist: just as a unary non-deterministic input/output induces 
information flow. The corresponding rules for the branching and selection are 
defined in the same way, see [ 16 ]. 



3.5 Examples of Typing 

(Non-linear) Let sync^ ()f. Then a:sync^, • 6:sync^ hg/ a.6 o for 

s' < s. 

(Truly linear) Let sync^ ()^, and its dual sync| ()|. Then, for ar- 
bitrary s and s', we have a: sync| • 6: sync’^, I — r a-b > — 14 - 6 . 

(Branching) Let boolg ([ 0 ]|)[, be the type of a boolean constant. Then 
we have 6:boolg hg T(5) > lb. For the conditional If(6, Pi,P2) introduced in § 2 , 
suppose that the two branches Pi and P2 can be typed at a security level above 
that of the boolean constant 6; that is. Pi is such that P -feiboolj, hg Pit>l A®l''b, 
for s' < s. Then P ■ &:boolJ, hg If(6, Pi,P2) >H(g)?''6. The innocuousness at b is 
crucial to show that (bool],,)y < (boolg,)J, in rule Out’’’. 

(Copy-cat) The following agent concisely represents the idea of safe informa- 
tion flow in the present calculus. It also serves as a substitute for free name 
passing for various purposes, including the imperative variable below. 

[P ^ b'^ ] = 6(c : boolJ).(If( 5 ',cinl,cinr) I [6 ^ 6']) 

This agent transforms a boolean behaviour from 6' to b. If s' < s, then we have: 
6:boolg, b' :boolJ, hg \b •<— b'] t> \b®l''b' . 
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(Imperative variable) We give a representation of an imperative variable, 
alternative to that presented in §2. 

YsLr{x‘‘b“ }'* = x\{z: (bools)X).(a(i/ fobbool^ ).[&^ t— &]| Var(a;fe)) & ih' :boolg ).Var(a;6^)] 

By the copy-cat, sending a new b' has the same effect as sending b. To type 
this process, let var), [(boolJ,)J&boolJ]J, Then x : var^, b : bool|,, \~s 
Var(x®&) > \x®l‘'b for s' < s. Note b has the level s' but the secrecy index is 
still s, since at b the output is innocuous. 

(Assignment) The following offers the typing of the behaviour representing 
x^ := . Let varj var^ and T = a;:var|[ • y:boolL. Then 

r [“H yinl(z: (boolL)L).2:(6: booly). 3 ;inr(&' :boolJj).[&' ^ h]t>lx® I'^y. 



4 Elementary Properties of Typed Processes 

This section presents the most basic syntactic properties of typed terms. We 
also briefly discuss one key behavioural property typed terms enjoy. First, the 
typing system satisfies the standard properties as weakening, strengthening and 
substitution closure. We only list two important properties. Below (1) says that 
every typable term has a canonical typing, i.e. whenever P is typable, P has the 
minimum action type and the highest secrecy index, and (2) means that channel 
types in P represent the constraints on the behaviour of P, rather than that of the 
outside environment (below A <g A' iff A = Ap(g)?''x and A' = AQ(g)?x(g)lJ;y(g)5r^ 
for some Aq). 

Proposition 1. (1) (canonical typing) If P \~s P A, then there exists sq 
Aq such that P hsg Pt> Aq, and whenever P Ai we have si < sq 

Aq <g Ai . 

(2) (subsumption-narrowing) If P ■ x:a hg P > A and a < a' , then P ■ x:a' hg 
Po A. 

Also if P ■ X :d\~s P > A and ai > (3i for each i, then P • A : /3 hg P t> A. 

A fundamental property of the typing system follows. Below — » is the multi-step 
reduction over preterms, defined just as that over untyped terms. 

Theorem 1. (subject reduction) If P h, P> A and P — » Q with bn(Q) hi 
fn(P) = 0, t/ien P hg g o A. 

The theorem says that whatever internal reduction takes place, its composability 
with the outside, which is controlled by both P and A, does not change; and 
that, moreover, the process is still secure with a no less secrecy index. For the 
proof, see [16]. 

The subject reduction is the basis of various significant behavioural properties 
for typed processes. Here we discuss only one of them, a non-interference pro- 
perty in typed terms (cf. [1,11,25]). A {P-s- A) -context is a typed context whose 




194 



K. Honda, V. Vasconcelos, and N. Yoshida 



hole is typed under the triple (r,s,A). Then, with respect to security level s, 
we can define the s-sensitive maximum sound typed congruence (cf. [17,28,36]), 
denoted =«, following the standard construction (see [16] for the full definition). 
We then obtain: 

(behavioural non-interference) Let C[-] be a (/q-So- A o)-context. 

If s < So and Tq Pi\> Aq {i = 1,2), then C[Pi\ =s C[P 2 \. 

The statement says that the behaviour of the whole at lower levels are never 
affected by its constituting behaviours which only act at higher levels. The proof 
uses a secrecy-sensitive version of typed bisimilarity, which is a fundamental 
element of the present theory and which turns out to be a subcongruence of the 
above maximum sound equality at each secrecy level. By noting ground constants 
are representable as constant behaviours, one may say the result extends Abadi’s 
non-interference result for ground values [1] to typed process behaviours. 

5 Imperative Information Flow as Typed Process 
Behaviour 

5.1 A Multi-Threaded Imperative Calculus 

Smith and Volpano [33] presented a type discipline for a basic multi-threaded 
imperative calculus in which well-typedness ensures secure information flow. In 
this section we show how the original system can be embedded in the typed 
calculus introduced in this paper, with a suggestion for a practically interesting 
extension of the original type discipline through the analysis of the notion of 
observables. We start with the syntax of untyped phrases of the original calculus, 
using x,y,z, . . . for imperative variables. 

e ::= a: | b | ei and C 2 b ::= tt | ff 

c ::= a: := e I Ci; C 2 | Ci | C 2 | if e then C\ else \ while e do c | skip 

For simplicity we restrict data types to booleans. We also added the skip com- 
mand, and use the parallel composition rather than a system of threads. 

The typing system is given in Figure 2. It uses command types of form 

p ::= s cmdt | s cmdt. 

Here s cmdt (resp. s cmdt) indicates convergent (resp. divergent) phrases and 
s, s' , . . . are secrecy levels as before. Note we take secrecy levels from an arbitrary 
lattice rather than from the two point one. We also use a base E, which is a 
finite map from variables to secrecy levels. Subsumption in expressions is merged 
into their typing rules for simplicity. Notice the contravariance in the first two 
subtyping rules [33,35] and the invariance in the last rule. The types in the 
original system are embedded into the command types above by setting: 

(h)° T (l)° T (h cmd)° T cmd^ (l cmd)° T cmd-f)-. 
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(Subtyping) 



r ^ 

S < S 

s cmdJI < s' crndj^ 



s < s 

s cmdj^ < s' cmd'O 



s cmd'O < s cmd'O 



(Typing) 

(var) 



E{x) < s 
E \- X : s 



(skip) 

E h skip : s cmdJi 
(assign) 

E h e : s E{x) = s 
E\- X \= e ■. s cmdJi 



(bool) E\- h \ s 
(subs) 

E \- c : p p < p' 
E'rc: p' 



E\- a \ s (i = 1,2) 

(and) 

h ei and 62 : s 

(compose) (parallel) 

E\- a : p E \- a : p 

E \- ci\C2 ■ p -E h Cl I C2 : p 



(if) (while) 

E \- e \ sec(p) E \- a : p Ehe:_L Ehc:_L cmdt 

E h if e then ci else C 2 ■ p Eh while e do c : _L cmd-ff 



Fig. 2. Typing System of Smith- Volpano calculus 



which makes explicit the notion of termination in the original types. With this 
mapping, the present system is a conservative extension of the original one in 
both subtyping judgement and typability. 

5.2 Embedding 

We start with the embedding of types and bases, given in Figure 3. Both com- 
mand types and bases are translated into two forms, one using channel types and 
the other using action types. In |p], a terminating type becomes a truly linear 
synchronisation type, and a non-terminating type becomes a non-linear synchro- 
nisation type, both described in §3.5. ((p)) / gives an action type accordingly. We 
note: given command types p,p', p < p' iff either (1) sec(|p]) > sec(|p']) and 
both are truly linear unary, (2) sec(|p]) >sec(|p']), |p] is truly linear unary and 
Ip'] is nonlinear, or (3) |p] = |p'| and both are nonlinear. This dissects com- 
mand types into (a) the secrecy level of the whole behaviour (which guarantees 
the lowest tampering level and which can be degraded by the degradation rule) 
and (b) the nature of the termination behaviour (noting “non-linear” means a 
termination action is not guaranteed). 

We next turn to the embedding of terms into processes in Figure 3. The 
framework assumes two boolean constant agents whose behaviours are given 
in §2.2 and which are shared by all processes, with principal channels ft and jf. 
These free channels are given the T-level, which is in accordance with Smith and 
Volpano’s idea that constants have no secrecy. Following the translation of types, 
each command becomes a process that upon termination emits an output signal 
at a channel given as a parameter, typically / (cf. [26]). We are using copy-cat 
in §3.5 to represent the functionality of value passing. The encoding of terms 
should be easily understandable, following the known treatment as in [26]: the 
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(Type and Base) 

[s cmdJJ.J =%ync]" ((s cmd^))/ M'^= it, ff :var± ((0)) ?''# ® ?''# 

[s cmd-fl-] syncf {{s cmdt)}f tf lE ■ x: s] {Ej ■ x: var^ {{E ■ x: s}} [£] • lx 



dsf J 

(Command) (s = sec(e)£: in all cases, var^ = (var^jVarj)) 

{E h skip : pjf =^7 

{E h ci;c2 : p]/ (t^p:([pl, W))([£' 1“ ci : pjg \ g^E h C2 : p]/) 

[£ h Cl I C 2 : p1/ {u fi, f 2 : (M, Ip]))([[-B h ci : p]/i | h C 2 : pjf^ \ /i-/2-7) 

[iJ h a: := e : p]/ eval[e]^(fe“ ).xinr(fe' :boolj/).([ 6 ^ •«— fe] | P) {s' = E{x)) 

[P h if e then ci else C 2 : p]/ '=^ eval[[e]|‘®( 6 *).If(fe, [Ph ci : p]/, [Ph C 2 : pj/) 

|P h while e do c : p]/ (i/p:(|p], [p]))(p | S{fgx)) (P = {f:s},ai = var^J 

where £ pX{f,g:{[pl, [p\),x:a). p.eval[e]®( 6 “).If(b, {{E^ c. p\g\X{f gx)),J) 
(Expression) 

eval[a:]^(P).P a:inl(t : (bool(/)^, ).t(b:hoolj ).P {s' = E{x)) 

eval[tt]'®(b“).P =^Lmk(b^ Ib]^,P) ([tt] ft, |ff] 7f) 

eval[ei and e 2 ]‘^(b*).P eval[ei]^(bi^).eval|e 2 ]‘^( 62 ^). 



If(bi^ , Link(b®, , P), Link(b®, , P)) (si = sec(ci)£:, s > si U S 2 ) 

Link(b“, h'‘ , P) (lab: vats)(P | [b •«— b^]) (s^ < s) 

(Security of an expression) 

dcf dcf dGf 

s6c{x)e = E{x) sec(b)_B = P sec(ei and 62)^ = sec(ei)B U sec(c2)E 



Fig. 3. Translation of the Smith- Volpano calculus 



interest however lies in how typahility is transformed via the embedding, and how 
this transformation sheds light on safe information-flow in the original system. 
The following theorem underpins this point. Below A dualises each mode in A 
which is assigned to a name, taking ? as the dual of !. 

Theorem 2 (Soundness). If E \- c : p, then |P] • / : |p] |P h c : p]/ c> 
{{E))^{{p))f with s = sec(p). 

A significant consequence of Theorem 2 is that we obtain, via the non-interference 
of typed processes mentioned in Section 4, the original non-interference result 
by Volpano and Smith [33] . The result holds for all terms typable in rules with 
Figure 2, including typed terms not coming from [33]. As another significant 
point, the encoding illustrates the reason why the divergent command types 
cannot be elevated as the convergent ones. Let E h while e do c : s cmdff. In the 
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encoding, the body of the loop, which is at level s, depends on the branching 
at level sec{e)E < s' lowering s can make this dependency dangerous, hence we 
cannot degrade s cmdfr as in the convergent types. Also note this argument does 
not use the restriction s = _L in the original type discipline. 

5.3 Termination as Observable 

After the preceding development, a natural question is whether we obtain any 
new information by doing such an endeavour or not. In this section we outline 
a technical development which may answer affirmatively to the question. 

We first return to the restriction of the original system that allows only the 
level _L for divergent commands. This does seem a strong constraint, especially 
with multiple security levels. How does this constraint appear in process repre- 
sentation? It means we only assign sync^ to a channel for the termination, which 
makes explicit the notion of termination as an observable, both as types and as 
behaviours. Once we have this notion, we ask what is the real content of having 
the observable only at _L. Clearly the answer is: “we allow everybody to observe 
the termination.” We may then ask what would be the outcome of not allowing 
everybody to observe the termination. Can this make sense? It seems it does: 
since the time of Multics and as was recently introduced in a widely known 
programming language [12], a mechanism by which we can prevent processes 
from even realising the presence of other processes, depending on assigned secu- 
rity levels, is a well-established idea in security, both from integrity and secrecy 
concerns. 

Further, there is a technically important observation that the encoding in 
Figure 3 does not apparently impose restriction on levels of divergent types: 
indeed the argument for Theorem 2 hardly depends on it. Thus we generalise 
the while rule as follows. 

, , E \- e : s E \- c : s cmdt 

(while) — I 

E h while e do c : s cmdi> 

The new rule is significant in its loosened condition on the guard of the loop, 
allowing us to type, say, (with m being a level between h and l), while do c : 
H cmdfr. With exactly the same encoding, we obtain the soundness result for the 
extended system with a statement identical to Theorem 2. 

Further, this new soundness result leads to a non-interference for the ex- 
tended imperative calculus just as in the original calculus. The formulation is 
however different since termination behaviours can change between two initial 
configurations if we set different values at levels lower than the termination ob- 
servable. Fixing a base E, let s' be a stipulated level of observability of the 
termination, and assume there are two environments (assignments of truth va- 
lues to variables) which are equivalent with respect to s' , i.e. they only differ in 
variables at levels higher than s'. Suppose also s is the level of the command 
type of a well-typed c under E and s < s' (thus if c includes a while command, 
its guard is not affected by the content of variables at levels above s'). Then 
if c terminates under one of these environments, it will also terminate under 
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the other environment, and the two resulting environments are equivalent with 
respect to s' (hence with respect to s). If we are without the condition s < s' , we 
cannot guarantee the same consequence, though observables except the termina- 
tion at each state are equivalent with respect to s' , related in a coinductive way. 
See [16] for details. Thus we are again guaranteed secure information flow with 
added typability, by starting from a typed process representation of imperative 
program behaviour. 
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Abstract. The domain of definite Boolean functions, Def, can be used 
to express the groundness of, and trace grounding dependencies bet- 
ween, program variables in (constraint) logic programs. In this paper, 
previously unexploited computational properties of Def are utilised to 
develop an efficient and succinct groundness analyser that can be coded 
in Prolog. In particular, entailment checking is used to prevent unneces- 
sary least upper bound calculations. It is also demonstrated that join 
can be defined in terms of other operations, thereby eliminating code 
and removing the need for preprocessing formulae to a normal form. 
This saves space and time. Furthermore, the join can be adapted to 
straightforwardly implement the downward closure operator that arises 
in set sharing analyses. Experimental results indicate that the new Def 
implementation gives favourable results in comparison with BDD-based 
groundness analyses. 

Keywords: Abstract interpretation, (constraint) logic programs, defi- 
nite Boolean functions, groundness analysis. 



1 Introduction 

Groundness analysis is an important theme of logic programming and abstract 
interpretation. Groundness analyses identify those program variables bound to 
terms that contain no variables (ground terms). Groundness information is typi- 
cally inferred by tracking dependencies among program variables. These depen- 
dencies are commonly expressed as Boolean functions. For example, the function 
X A {y ^ z) describes a state in which x is definitely ground, and there exists a 
grounding dependency such that whenever z becomes ground then so does y. 

Groundness analyses usually track dependencies using either Pos [3,4,8,15,21], 
the class of positive Boolean functions, or Def [1,16,18], the class of definite posi- 
tive functions. Pos is more expressive than Def, but Def analysers can be faster 
[1] and, in practise, the loss of precision for goal-dependent groundness analysis 
is usually small [18]. This paper is a sequel to [18] and is an exploration of using 
Prolog as a medium for implementing a Def analyser. The rationale for this work 
was partly to simplify compiler integration and partly to deliver an analyser that 
was small and thus easy to maintain. Furthermore, it has been suggested that 
the Prolog user community is not large enough to warrant a compiler vendor to 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 200-214, 2000. 
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making a large investment in developing an analyser. Thus any analysis that can 
be quickly prototyped in Prolog is particularly attractive. The main drawback 
of this approach has traditionally been performance. 

The efficiency of groundness analysis depends critically on the way dependen- 
cies are represented. C and Prolog based Def analysers have been constructed 
around two representations: (1) Armstrong et al [1] argue that Dual Blake Cano- 
nical Form (DBCF) is suitable for representing Def. This represents functions 
as conjunctions of definite (propositional) clauses [12] maintained in a normal 
(orthogonal) form that makes explicit transitive variable dependencies. For ex- 
ample, the function {x ^ y) /\{y z) is represented as {x ^ {y y z)) f\ {y ^ z) . 
Garcia de la Banda et al [16] adopt a similar representation. It simplifies join 
and projection at the cost of computing and representing the (extra) transitive 
dependencies. Introducing redundant dependencies is best avoided since pro- 
gram clauses can (and sometimes do) contain large numbers of variables; the 
speed of analysis is often related to its memory usage. (2) King et al show how 
meet, join and projection can be implemented with quadratic operations based 
on a Sharing quotient [18]. Def functions are essentially represented as a set 
of models and widening is thus required to keep the size of the representation 
manageable. Widening trades precision for time and space. Ideally, however, it 
would be better to avoid widening by, say, using a more compact representation. 

This paper contributes to Def analysis by pointing out that Def has impor- 
tant (previously unexploited) computational properties that enable Def to be 
implemented efficiently and coded straightforwardly in Prolog. Specifically, the 
paper details: 

— how functions can be represented succinctly with non-ground formulae. 

— how to compute the join of two formulae without preprocessing the formulae 
into orthogonal form [1]. 

— how entailment checking and Prolog machinery, such as difference lists and 
delay declarations, can be used to obtain a Def analysis in which the most 
frequently used domain operations are very lightweight. 

— that the speed of an analysis based on non-ground formulae can compare 
well against BDD-based Def and Pos analyses whose domain operations are 
coded in C [1]. In addition, even without widening, a non-ground formulae 
analyser can be significantly faster than a b'/ionn^-based Def analyser [18]. 

Finally, a useful spin-off of our work is a result that shows how the downward 
closure operator that arises in BDD-based set sharing analysis [10] can be im- 
plemented straightforwardly with standard BDD operations. This saves the im- 
plementor the task of coding another BDD operation in C. 

The rest of the paper is structured as follows: Section 2 details the necessary 
preliminaries. Section 3 explains how join can be calculated without resorting to 
a normal form and also details an algorithm for computing downward closure. 
Section 4 investigates the frequency of various Def operations and explains how 
representing functions as (non-ground) formulae enables the frequently occurring 
Def operations to be implemented particularly efficiently using, for example. 
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entailment checking. Section 5 evaluates a non-ground Def analyser against two 
BDD analysers. Sections 6 and 7 describe the related and future work, and 
section 8 concludes. 

2 Preliminaries 

A Boolean function is a function / : BooV^ — >■ Bool where n > 0. A Boolean 
function can be represented by a propositional formula over X where |X| = n. 
The set of propositional formulae over X is denoted by Boolx- Throughout this 
paper, Boolean functions and propositional formulae are used interchangeably 
without worrying about the distinction [1]. The convention of identifying a truth 
assignment with the set of variables M that it maps to true is also followed. 
Specifically, a map tpx{M) '■ Boolx is introduced defined by: tpx{M) = 

(am) a (-■ \/X\M). In addition, the formula AY is often abbreviated as Y. 

Definition 1. The (biiective) map modelx ■ Boolx -A ip(p(X)) is defined by: 
modelxif) = {MCX\ V^x(M) h /}• 

Example 1. IfX = {x, j/}, then the function {(true, true) i-A- true, {true, false) 
false, {false, true) false, {false, false) false} can be represented by the 
formula a: A 2 / . Also, modelx{x A y) = {{x,y}} and model x{x \/ y) = {{a;},{ 2 /}, 

{x,y}}- 

The focus of this paper is on the use of sub-classes of Boolx in tracing 
groundness dependencies. These sub-classes are defined below: 

Definition 2. Posx is the set of positive Boolean functions over X. A function 
/ is positive iff AT G model x{f)- Def x is the set of positive functions over 
X that are definite. A function / is definite iff M fl M' G modelxif) for all 
M,M' G modelxif)- 

Note that Defx C Posx- One useful representational property of Defx is that 
each / G Defx can be described as a conjunction of definite (propositional) 
clauses, that is, / = Af^^iyi ^ Yi) [12]. 

Example 2. Suppose X = {x, y, z} and consider the following table, which states, 
for some Boolean functions, whether they are in Def x or Posx and also gives 
modelx- 



f 


Defx Posx 


modelxif) 


false 






0 










x Ay 


• • 


{ 


{x,y}, 




{x 


y, 


4} 


xV y 


• 


{ {x},{y}, 


{x,y}, {x, 


zj, {y, 


zj, {x 


y, 


4} 


x^y 


• • 


{0, W, 


{z}, {x,y}, {x, 


z}, 


{x 


y, 


4} 


xV {y ^ z) 


• 


{0, {x}, {y}. 


{x,y}, {x, 


zj, {y, 


z], {x 


y, 


4} 


true 


• • 


{0, {x}, {y}. 


{z}, {x,y}, {x, 


z}, (y, 


zj, {x 


y, 


4} 



Note, in particular, that xMy is not in Defx (since its set of models is not closed 
under intersection) and that false is neither in Posx nor Def x- 
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Fig. 1. Basse diagrams 



Defining /1V/2 = A{/ G Def x I /i h /A/2 h /}, the 4 -tuple (De/^, A, V) is 
a finite lattice [ 1 ], where true is the top element and AX is the bottom element. 
Existential quantification is defined by Schroder’s Elimination Principle, that is, 
3 x./ = f[x !->• true] V f[x false]. Note that if / G Def x then 3 x./ G Def x 
[!]• 

Example 3. If X = {x,y} then x\/{x y) = A{(x ^ y),true} = {x <— y), as 
can be seen in the Basse diagram for dyadic Defx (Fig. 1 ). Note also that xWy 
= A{true} = true (xM y'). 

The set of (free) variables in a syntactic object o is denoted var{o). Also, 

3{yi, . ■ . ,yn}-f (project out) abbreviates 3yi 3 y„./ and 3 Y./ (project onto) 

denotes 3var{f) \ Y.f. Let p\,p2 be fixed renamings such that X fl pi{X) = 
XC\p2{X) = pi{X)C\p2{X) = 0 . Renamings are bijective and therefore invertible. 
The downward and upward closure operators / and f are defined by / / = 
modelf^{{r\S | 0 C S' C modelx(f)}) and t/ = modelx^ {{US | 0 C S C 
model x{f)}) respectively. Note that // has the useful computational property 
that // = A{/' G Defx I / 1 = f} if / G Posx- Finally, for any / G Boolx, 
conegif) = modelx\{X \M\M G modelxif)})- 

Example /. Note that coneg{x V y) = modeZjjj, ^j({{cc}, {y}, 0 }) and therefore 
'\coneg{x V y) = true. Hence coneg{'\ coneg{x V y)) = true =/ xV y. 

This is no coincidence as conegif coneg(f)) =if- Therefore coneg and f can be 
used to calculate /. 

3 Join and Downward Closnre 

Calculating join in Def is not as straightforward as one would initially think, 
because of the problem of transitive dependencies. Suppose /i,/2 G Defx so 
that fi = AFi where Fi = {y\ G- Yf...,yf G- Yf.}. One naive tactic to 
compute /1V/2 might be E = {y ^ Y^^ AY^\y^Y^^GFiAyG- Y^ G F2}. 
Unfortunately, in general, AF ^ /1V/2 as is illustrated in the following example. 

Example 5. Put Fi = {x ^ u,u ^ y} and F2 = {x ^ v,v y} so that 
F = {x u A v}, but /1V/2 = (x <— (u A v)) A (x <— y) f AF. Note, however, 
that if El = {x ^ u,u ^ y,x <— y} and F2 = {x <— v,v ^ y,x ^ y} then 
E = {x G- (uAv),xG- (uAy),xG- (v Ay),x ■(— y} so that /1V/2 = AF. 
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The problem is that Fi must be explicit about transitive dependencies (this idea 
is captured in the orthogonal form requirement of [ 1 ]). This, however, leads to 
redundancy in the formula which ideally should be avoided. (Formulae which not 
necessarily orthogonal will henceforth be referred to as non-orthogonal formulae.) 

It is insightful to consider V as an operation on the models of fi and /2 . Since 
both modelxifi) are closed under intersection, V essentially needs to extend 
modelxifi) U modelx{f2) with new models M\ fl M2 where Mi € modelxifi) 
to compute /1V/2. The following definition expresses this observation and leads 
to a new way of computing V in terms of meet, renaming and projection, that 
does not require formulae to be first put into orthogonal form. 

Definition 3. The map Y : Boolx^ — >■ Boolx is defined by: /1Y/2 = 3 T./iY /2 
where Y = uar(/i)Uuar(/2) and /iY/2 = Pi(/i)Ap2(/2)AAygy?/ ipliy)^P2iy))■ 

Note that Y operates on Boolx rather than Defx- This is required for the 
downward closure operator. Lemma 1 expresses a key relationship between Y 
and the models of /i and /2. 

Lemma 1. Let /i,/2 € Boolx- M G modelxifif f2) if and only if there exists 
Mi G modelxifi) such that M = Mi fl M2- 

Proof Put X' = XU piiX) U p2iX). 

Let M G modelxifif f2)- There exists M C M' C X' such that M' G 
modelx'ifi Y /2). Let Mi = p“^(M' fl pi{V)). Observe that M C Mi fl M2 since 
ipiiy) A P2iy)) ^ y- Also observe that Mi n M2 C M since y ^ ipiiv) A P2iy))- 
Thus Mi G modelxifi) and M = Mi fl M2, as required. 

Let Mi G modelxifi) and put M = MiC\M2 and M' = MUpi(Mi)Upi(M2). 
Observe M' G modelx'ifi Y /2) so that M G modeZx (/1Y/2). ■ 

From lemma 1 flows the following corollary and also the useful result that Y is 
monotonic. 

Corollary 1. Let / G Posx- Then / = fYf if and only if / G Defx- 

Lemma 2. Y is monotonic, that is, /1Y/2 |= /1Y/2 whenever fi |= /'. 

Proof Let M G modelxifif f2)- By lemma 1 , there exist Mi G modelxifi) such 
that M = Ml n M2- Since fi \= /', Mi G modelxifi) hence, by lemma 1 , 
M & model xifif fi)- ■ 

The following proposition states that Y coincides with V on Defx- This gives a 
simple algorithm for calculating V that does not depend on the representation 
of a formula. 

Proposition 1. Let /i,/2 G Defx- Then /1Y/2 = /1V/2. 

Proof Since X \= f 2 it follows by monotonicity that fi = fiYX |= /1Y/2 and 
similarly /2 \= fiY f 2- Hence /1V/2 \= /1Y/2 by the definition of V. 

Now let M G modelxifiY f2)- By lemma 1 , there exists Mi G modelxifi) 
such that M = Mi fl M2 G modelxifi'f f2)- Hence /1Y/2 |= /1V/2. ■ 
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Downward closure is closely related to Y and, in fact, Y can be used repea- 
tedly to compute a finite iterative sequence that converges to This is stated 
in proposition 2. Finiteness follows from bounded chain length of Posx- 

Proposition 2. Let / G Posx- Then |/ = Vi>i/i where fi £ Posx is the 
increasing chain given by: fi = f and fi+i = f^Y fi. 

Proof. Let M £ modelxU /)■ Thus there exists Mj £ model x{f) such that 
M = Observe M\ 0 M2, M3 0 M4, . . . £ modelx{f2) and therefore 

M £ modelx{fiiog 2 im)])- Since m < 2^” where n = |X| it follows that If |= / 2 ". 

Proof by induction is used for the opposite direction. Observe that /i \=if. 
Suppose fi \=if. Let M G modelx{fi+i)- By lemma 1 there exists Mi, M2 G 
modelxifi) such that M = Mi O M2. By the inductive hypothesis Mi, M2 G 
model xiif) thus M £ model x{if)- Hence /i+i \=if. 

Finally, Vi=i/i G Defx since fi £ Posx and Y is monotonic and thus 
X G model X {y i=ifi). B 

The significance of this is that it enables f to be computed in terms of existing 
HDD operations thus freeing the implementor from more low level coding. 



4 Design and Implementation 

There are typically many degrees of freedom in designing an analyser, even 
for a given domain. Furthermore, work can often be shifted from one abstract 
operation into another. For example, Garcia de la Banda et al [16] maintain 
DBCF by a meet that uses six rewrite rules to normalise formulae. This gives a 
linear time join and projection at the expense of an exponential meet. Conversely, 
King et al [18] have meet, join and projection operations that are quadratic in 
the number of models. Note, however, that the numbers of models is exponential 
(explaining the need for widening) . Ideally, an analysis should be designed so that 
the most frequently used operations have low complexity and are therefore fast. 



4.1 Frequency Analysis 

In order to balance the frequency of an abstract operation against its cost, a 
BDD-based Def analyser was implemented and instrumented to count the num- 
ber of calls to the various abstract operations. The BDD-based Def analyser is 
coded in Prolog as a simple meta-interpreter that uses induced magic-sets [7] 
and eager evaluation [22] to perform goal-dependent bottom-up evaluation. 

Induced magic is a refinement of the magic set transformation, avoiding much 
of the re-computation that arises because of the repetition of literals in the 
bodies of magicked clauses [7]. It also avoids the overhead of applying the magic 
set transformation. Eager evaluation [22] is a fixpoint iteration strategy which 
proceeds as follows: whenever an atom is updated with a new (less precise) 
abstraction, a recursive procedure is invoked to ensure that every clause that 
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has that atom in its body is re-evaluated. Induced magic may not be as efficient 
as, say, GAIA [19] but it can be coded easily in Prolog. 

The BDD-based Def analysis is built on a ROBDD package coded by Arm- 
strong and Schachte [1]. The package is intended for Pos analysis and there- 
fore supplies a V join rather than a V join. The package did contain, however, 
a hand-crafted C upward closure operator f enabling V to be computed by 
/ 1 V /2 =i(/i V / 2 ) where If = coneg{'\coneg{f)). The operation coneg(f) can be 
computed simply by interchanging the left and right (true and false) branches 
of an ROBDD. The analyser also uses the environment trimming tactic used by 
Schachte to reduce the number of variables that occur in a ROBDD. Specifi- 
cally, clause variables are numbered and each program point is associated with a 
number, in such a way that if a variable has a number less than that associated 
with the program point, then it is redundant (does not occur to the right of the 
program point) and hence can be projected out. This optimisation is important 
in achieving practical analysis times for some large programs. 

The following table gives a breakdown of the number of calls to each abstract 
operation in the BDD-based Def analysis of eight large programs. Meet, join, 
equiv, project and rename are the obvious Boolean operations. Join (diff) is the 
number of calls to a join / 1 V /2 where / 1 V /2 yf fi and / 1 V /2 yf f 2 - Project (trim) 
are the number of calls to project that stem from environment trimming. 



file 


strips 


chat .parser 


sim_v5-2 


peval 


aircraft 


essln 


chat_80 


aqua_c 


meet 


815 


4471 


2192 


2198 


7063 


8406 


15483 


112455 


join 


236 


1467 


536 


632 


2742 


1668 


4663 


35007 


join (diff) 


33 


243 


2 


185 


26 


177 


693 


5173 


equiv 


236 


1467 


536 


632 


2742 


1668 


4663 


35007 


project 


330 


1774 


788 


805 


3230 


2035 


5523 


38163 


project (trim) 


173 


1384 


770 


472 


2082 


2376 


5627 


42989 


rename 


857 


4737 


2052 


2149 


8963 


5738 


14540 


103795 



Observe that meet and rename are called most frequently and therefore, 
ideally, should be the most lightweight. Project, project (trim), join and equiv 
calls occur with similar frequency but note that it is rare for a join to differ from 
both its arguments. Join is always followed by an equivalence and this explains 
why the join and equiv rows coincide. 

Next, the complexity of ROBDD and DBCF (specialised for Def [1]) opera- 
tions are reviewed in relation to their calling frequency. Suggestions are made 
about balancing the complexity of an operation against its frequency by using a 
non-orthogonal formulae representation. 

For ROBDDs (DBCF) meet is quadratic (exponential) in the size of its argu- 
ments [1]. For ROBDDs (DBCF) these arguments are exponential (polynomial) 
in the number of variables. Representing Def functions as non-orthogonal for- 
mulae is attractive since meet is concatenation which can be performed in con- 
stant time (using difference lists). Renaming is quadratic for ROBDDs (linear 
for DBCF) in the size of its argument [1]. Renaming a non-orthogonal formula is 
0(mlog(n)) where m (n) is the number of symbols (variables) in its argument. 
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For ROBDDs (DBCF), join is quadratic (quartic) in the size of its argu- 
ments [1]. For non-orthogonal formulae, join is exponential. Note, however, that 
the majority of joins result in one of the operands and hence are unnecessary. 
This can be detected by using an entailment check which is quadratic in the 
size of the representation. Thus it is sensible to filter join through an entailment 
check so that join is called comparatively rarely. Therefore its complexity is less 
of an issue. Specifically, if /i \= then /1V/2 = /2. For ROBDDs, equivalence 
checking is constant time, whereas for DBCF it is linear in the size of the re- 
presentation. For non-orthogonal formulae, equivalence is quadratic in the size 
of the representation. Observe that meet occurs more frequently than equality 
and therefore a gain should be expected from trading an exponential meet and 
a linear join for a constant time meet and an exponential join. 

For ROBDDs (DBCF), projection is quadratic (linear) in the size of its ar- 
guments [1]. For a non-orthogonal representation, projection is exponential, but 
again, entailment checking can be used to prevent the majority of projections. 

4.2 The GEP Representation 

A call (or answer) pattern is a pair (a, /) where a is an atom and / € Def 
Normally the arguments of a are distinct variables. The formula / is a con- 
junction (list) of propositional Horn clauses in the Def analysis described in 
this paper. In a non-ground representation the arguments of a can be instantia- 
ted and aliased to express simple dependency information [9]. For example, if 
a = p{xi, ..., S5), then the atom p(xi, true, x\,X4,true) represents a coupled with 
the formula (a;i O X3) Ax2/\x^. This enables the abstraction {p{xi, ..., x^), fi) to 
be collapsed to {p{x\,true, xi,Xi, true), /2) where fi = {x\ -n- X3) AX2AX5A f2- 
This encoding leads to a more compact representation and is similar to the GER 
factorisation of ROBDDs proposed by Bagnara and Schachte [3] . The represen- 
tation of call and answer patterns described above is called GEP (groundness, 
equivalences and propositional clauses) where the atom captures the first two 
properties and the formula the latter. Note that the current implementation of 
the GEP representation does not avoid inefficiencies in the representation such 
as the repetition of Def formulae. 

4.3 Abstract Operations 

The GEP representation requires the abstract operations to be lifted from Boo- 
lean formulae to call and answer patterns. 



Meet The meet of the pairs (ai,/i) and ( 02 , / 2 ) can be computed by unifying 
oi and 02 and concatenating fi and /2. 

Renaming The objects that require renaming are formulae and call (answer) 
pattern GEP pairs. If a dynamic database is used to store the pairs [17], then 
renaming is automatically applied each time a pair is looked-up in the database. 
Formulae can be renamed with a single call to the Prolog builtin copy Term. 
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Join Calculating the join of the pairs (ai, /i) and (02, /2) is complicated by the 
way that join interacts with renaming. Specifically, in a non-ground representa- 
tion, call (answer) patterns would be typically stored in a dynamic database so 
that var{ai) fl var{a2) = 0 - Hence (ai,/i) (or equivalently (02, /2)) have to be 
appropriately renamed before the join is calculated. This is achieved as follows. 
Plotkin’s anti-unification algorithm [20] is used to compute the most specific 
atom a that generalises a\ and 02. The basic idea is to reformulate oi as a pair 
{a'l, fl) which satisfies two properties: a\ is a syntactic variant of a; the pair 
represents the same dependency information as (ai,true). A pair {a'2, f) is li- 
kewise constructed that is a reformulation of 02. The atoms a, a{ and a'2 are 
unified and then the formula / = (/i A /( ) Y(/2 A f) is calculated as described in 
section 3 to give the join (a, /). In actuality, the computation of {af /() and the 
unification a = a[ can be combined in a single pass as is outlined below. Suppose 
a = pfi, . . . ,t„) and Oi = p{si , . . . , s„). Let go = true. For each 1 < fc < n, one 
of the following cases is selected. ( 1 ) If tk is syntactically equal to Sk, then put 
9k = 9k-i- (2) If Sfc is bound to true, then put gk = gk-i A (tk ^ true). (3) If 
Sk € var((si, . . . , Sfc_i)), then unify Sk and tk and put gk = gk-i- ( 4 ) Otherwise, 
put 9k = gk-i A {tk ^ Sk) A (sfc ^ tk). Finally, let f = g„. The algorithm is 
applied analogously to bind variables in a and construct f^. The join of the pairs 
is then given by (a, {f A /{)y(/2 A f)). 

Example 6. Consider the join of the GEP pairs {a\,true) and (02,2/1 2/2) 

where oi = p{true,xi,x\,xi) and 02 = p{y\, 92, true, true). The most specific 
generalisation of oi and 02 is o = p{zi, Z2, zo, Z3). The table below illustrates the 
construction of (o(,/() and {af f) in the left- and right-hand columns. 



k 


case 


9k 


9k 


case' 


9k 


0 'k 


0 




true 


e 




true 


e 


1 


2 


z\ G- true 


e 


4 


2/1 2l 


e 


2 


4 


9 i A {z2 GG a;i) 


9i 


4 


g'l A ( 2/2 Z 2 ) 


9i 


3 


3 


92 


{xi Z 3 } 


2 


g'2 A (z3 true) 


9i 


4 


1 


92 


03 


2 


2/3 A (z3 true) 


01 



Putting 0 = 6*4 o 04 = {xi !->■ Z3}, the join is given by ( 0 (o), 0((/4 A true)Y 9 {g'i A 
2/1 ^ 2/2)) = {a,{zi Y- true) A {z2 ^ 23)y(2/i O zi) A (2/2 Z2) A (zs ^ 

true) A (2/1 ^ 2/2)) = (p{zi,Z2, Z3, Z3), {zi ^ Z2) A (Z3 Y- Z2)}. 

Note that often oi is a variant of 02. This can be detected with a lightweight 
variance check, enabling join and renaming to be reduced to unifying oi and 02 
and computing / = /1Y/2 to give the pair (oi, /). 



Projection Projection is only applied to formulae. Each of the variables to be 
projected out is eliminated in turn, as follows. Suppose x is to be projected out of 
/. First, all those clauses with x as their head are found, giving {x Xi \ i G 1 } 
where / is a (possibly empty) index set. Second, all those clauses with x in the 
body are found, giving {y ^ Yj \ j G J} where J is a (possibly empty) index 
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set. Thirdly these clauses of / are replaced by {y ^ Zij jiG/AjeJA Zij = 
Xi U {Yj \ {x}) A y ^ ^i,j} (syllogizing). Fourthly, a compact representation 
is maintained by eliminating redundant clauses (absorption). By appropriately 
ordering the clauses, all four steps can be performed in a single pass over /. A 
final pass over / retracts clauses such as x •<— true by binding x to true and also 
removes clause pairs such as y <— z and z y hy unifying y and z. 



Entailment Entailment checking is only applied to formulae. A forward chai- 
ning decision procedure for propositional Horn clauses (and hence Def) is used 
to test entailment. A non-ground representation allows chaining to be imple- 
mented efficiently using block declarations. To check that Alh^j/i ^ Yi entails 
z ^ Z the variables of Z are first grounded. Next, a process is created for each 
clause yi ^ Yi that blocks until 1) is ground. When F) is ground, the process 
resumes and grounds yi. If z is ground after a single pass over the clauses, then 
^ Yi) \= z ^ Z. By calling the check under negation, no problematic 
bindings or suspended processes are created. 



5 Experimental Evaluation 

A Def analyser using the non-ground techniques described in this paper has been 
implemented. This implementation is built in Prolog using the same induced 
magic framework as for the BDD-based Def analyser, therefore the analysers 
work in lock step and generate the same results. (The only difference is that 
the non-ground analyser does not implement environment trimmed since the 
representation is far less sensitive to the number of variables in a clause.) The 
core of the analyser (the fixpoint engine) is approximately 400 lines of code and 
took one working week to write, debug and tune. 

In order to investigate whether entailment checking, the join (y) algorithm, 
and the GEP representation are enough to obtain a fast and scalable analysis, 
the non-ground analyser was compared with the BDD-based analyser for speed 
and scalability. Since King et al [18] do not give precision results for Pos for 
larger benchmarks, we have also implemented a BDD-based Pos analyser in 
the same vein, so that firmer conclusions about the relative precision of Def 
and Pos can be drawn. It is reported in [2], [3] that a hybrid implementation 
of ROBDDs, separating maintenance of definiteness information and of various 
forms of dependency information can give significantly improved performance. 
Therefore, it is to be expected that an analyser based on such an implementation 
of ROBDDs would be faster than that used here. 

The comparisons focus on goal-dependent groundness analysis of 60 Prolog 
and CLP(T^) programs. The results are given in the table below. In this table, 
the size column gives the number of distinct (abstract) clauses in the programs. 
The abs column gives the time for parsing the files and abstracting them, that 
is, replacing built-ins, such as arg(x, t, s), with formulae, such as x A (s ^ t). 
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file 


size abs 


Def lya 


fixpoint 
Def Ron 


Pos 


precisiol 
Def Pos 


% 


rot ate. pi 


3imj 


0.00 


0.00 


imj 


T 


h 


50 


circuit, clpr 


20 0.02 


0.02 


0.03 


0.02 


3 


3 


0 


air.clpr 


20 0.02 


0.02 


0.03 


0.02 


9 


9 


0 


dnf.clpr 


23 0.02 


0.01 


0.01 


0.01 


8 


8 


0 


dcg.pl 


23 0.02 


0.01 


0.01 


0.02 


59 


59 


0 


hamiltonian.pl 


23 0.02 


0.01 


0.01 


0.01 


37 


37 


0 


polylO.pl 


29 0.02 


0.00 


0.00 


0.01 


0 


0 


0 


semi.pl 


31 0.03 


0.03 


0.28 


0.28 


28 


28 


0 


life.pl 


32 0.02 


0.01 


0.02 


0.02 


58 


58 


0 


rings-on-pegs . clpr 


34 0.02 


0.02 


0.04 


0.04 


11 


11 


0 


meta.pl 


35 0.01 


0.01 


0.02 


0.01 


1 


1 


0 


browse.pl 


36 0.02 


0.01 


0.02 


0.02 


41 


41 


0 


gabriel.pl 


38 0.02 


0.01 


0.03 


0.03 


37 


37 


0 


tsp.pl 


38 0.03 


0.01 


0.04 


0.04 


122 


122 


0 


nandc.pl 


40 0.03 


0.01 


0.03 


0.03 


37 


37 


0 


csg.clpr 


48 0.04 


0.01 


0.02 


0.02 


12 


12 


0 


disi_r.pl 


48 0.02 


0.01 


0.04 


0.04 


97 


97 


0 


ga.pl 


48 0.06 


0.01 


0.04 


0.04 


141 


141 


0 


critical, clpr 


49 0.03 


0.03 


0.04 


0.04 


14 


14 


0 


sccl.pl 


51 0.03 


0.01 


0.06 


0.04 


89 


89 


0 


mastermind.pl 


53 0.04 


odjr 


DdTT- 


HM 


^3“ 




“D 


ime_v2-2-l.pl 


53 0.04 


0.03 


0.09 


0.08 


101 


101 


0 


robot.pl 


53 0.03 


0.00 


0.01 


0.01 


41 


41 


0 


cs_r.pl 


54 0.05 


0.01 


0.04 


0.04 


149 


149 


0 


tictactoe.pl 


56 0.06 


0.01 


0.03 


0.04 


60 


60 


0 


flatten.pl 


56 0.03 


0.04 


0.09 


0.08 


27 


27 


0 


dialog.pl 


61 0.02 


0.01 


0.03 


0.03 


70 


70 


0 


map.pl 


66 0.02 


0.01 


0.08 


0.08 


17 


17 


0 


neural.pl 


67 0.05 


0.01 


0.05 


0.05 


123 


123 


0 


bridge. clpr 


69 0.08 


0.01 


0.02 


0.03 


24 


24 


0 


conman.pl 


71 0.04 


[mr 




nn2' 


6 


6 


0 


kalah.pl 


78 0.04 


0.02 


0.04 


0.04 


199 


199 


0 


unify.pl 


79 0.04 


0.07 


0.12 


0.10 


70 


70 


0 


nbody.pl 


85 0.07 


0.06 


0.10 


0.11 


113 


113 


0 


peep.pl 


86 0.11 


0.03 


0.06 


0.05 


10 


10 


0 


boyer.pl 


95 0.06 


0.04 


0.04 


0.05 


3 


3 


0 


bryant.pl 


95 0.07 


0.20 


0.15 


0.15 


99 


99 


0 


sdda.pl 


99 0.05 


0.06 


0.06 


0.06 


17 


17 


0 


read.pl 


105 0.07 


0.06 


0.11 


0.10 


99 


99 


0 


press.pl 


109 0.07 


0.11 


0.16 


0.18 


53 


53 


0 


qpian.pl 


109 0.08 


0.02 


0.08 


0.07 


216 


216 


0 


trs.pl 


111 0.11 


0.11 


0.31 


0.60 


13 


13 


0 


reducer.pl 


113 0.07 


0.11 


0.16 


0.14 


41 


41 


0 


simple _analyzer . pi 


140 0.09 


0.13 


0.34 


0.44 


89 


89 


0 


dbqas.pl 


146 0.09 


0.02 


0.05 


0.05 


43 


43 


0 


ann.pl 


148 0.09 


0.11 


0.24 


0.23 


74 


74 


0 


asm.pl 


175 0.14 


0.06 


0.14 


0.13 


90 


90 


0 


nand.pl 


181 0.12 


0.04 


0.21 


0.19 


402 


402 


0 


rubik.pl 


219 0.16 


0.15 


0.39 


0.40 


158 


158 


0 


lnprolog.pl 


221 0.10 


0.08 


0.14 


0.14 


143 


143 


0 


yh.pl 


225 0.15 


0125“ 


0123“ 


H2¥ 


T 




“0 


sim.pl 


249 0.18 


0.39 


0.56 


0.52 


100 


100 


0 


strips.pl 


261 0.17 


0.01 


0.11 


0.11 


142 


142 


0 


chat_parser.pl 


281 0.21 


0.45 


0.59 


0.60 


505 


505 


0 


sim_v5-2.pl 


288 0.17 


0.05 


0.20 


0.20 


455 


457 


0.4 


peval.pl 


328 0.16 


0.28 


0.27 


0.27 


27 


27 


0 


aircraft.pl 


397 0.48 


0.14 


0.55 


0.59 


687 


687 


0 


essln.pl 


565 0.36 


0.21 


0.58 


0.58 


163 


163 


0 


chat_80.pl 


888 0.92 


1.31 


1.89 


2.27 


855 


855 


0 


aqua_c.pl 


4009 2.48 


11.29 


104.99 897.10 


1288 1288| 


0 
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The abstracter deals with meta-calls, asserts and retracts following the ele- 
gant (two program) scheme detailed by Bueno et al [6]. The fixpoint columns 
give the time, in seconds, to compute the fixpoint for each of the three analysers 
(De/jvG Def gjjjj denote respectively the non-ground and BDD-based Def 
analyser). The precision columns give the total number of ground arguments in 
the call and answer patterns (and exclude those ground arguments for predicates 
introduced by normalising the program into definite clauses) . The % column ex- 
press the loss of precision by Def relative to Pos. All three analysers were coded 
in SICStus 3.7 and the experiments performed on a 296MHz Sun UltraSPARC-II 
with 1GByte of RAM running Solaris 2.6. 

The experimental results indicate the precision of Def is close to that of 
Pos. Although rotate.pl is small it has been included in the table because it 
was the only program for which significant precision was lost. Thus, whilst it is 
always possible to construct programs in which disjunctive dependency informa- 
tion (which cannot be traced in Def) needs to be tracked to maintain precision, 
these results suggest that Def is adequate for top-down groundness analysis of 
many programs. 

The speed of the non-ground Def analyser compares favourably with both 
the BDD analysers. This is surprising because the BDD analysers make use 
of hashing and memoisation to avoid repeated work. In the non-ground Def 
analyser, the repeated work is usually in meet and entailment checking, and these 
operations are very lightweight. In the larger benchmarks, such as aqua_c.pl, the 
BDD analysis becomes slow as the BDDs involved are necessarily large. Widening 
for BDDs can make such examples more manageable [15]. Notice that the time 
spent in the core analyser (the fixpoint engine) is of the same order as that spent 
in the abstracter. This suggests that a large speed up in the analysis time needs 
to be coupled with a commensurate speedup in the abstracter. 

To give an initial comparison with the Sharing-hased Def analyser of King et 
al [18], the clock speed of the Sparc-20 used in the Sharing experiments has been 
used to scale the results in this paper. These findings lead to the preliminary 
conclusion that the analysis presented in this paper is about twice as fast as the 
Sharing quotient analyser. Furthermore, this analyser relies on widening to keep 
the abstractions small, hence may sacrifice some precision for speed. 

6 Related Work 

Van Hentenryck et al [21] is an early work which laid a foundation for BDD-based 
Pos analysis. Corsini et al [11] describe how variants of Pos can be implemen- 
ted using Toupie, a constraint language based on the y^-calculus. If this analyser 
was extended with, say, magic sets, it might lead to a very respectable goal- 
dependent analysis. More recently, Bagnara and Schachte [3] have developed the 
idea [2] that a hybrid implementation of a ROBDD that keeps definite informa- 
tion separate from dependency information is more efficient than keeping the 
two together. This hybrid representation can significantly decrease the size of an 
ROBDD and thus is a useful implementation tactic. 
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Armstrong et al [1] study a number of different representations of Boolean 
function for both Def and Pos. An empirical evaluation on 15 programs suggests 
that specialising Dual Blake Canonical Form (DBCF) for Def leads to the fastest 
analysis overall. This representation of a Def function / is in orthogonal form 
since it is constructed from all the prime consequents that are entailed by /. It 
thus includes redundant transitive dependencies. Armstrong et al [1] also perform 
interesting precision experiments. Def and Pos are compared, however, in a 
bottom-up framework that is based on condensing which is therefore biased 
towards Pos. The authors point out that a top-down analyser would improve 
the precision of Def relative to Pos and our work supports this remark. 

Garcia de la Banda et al [16] describe a Prolog implementation of Def that is 
also based on an orthogonal DBCF representation (though this is not explicitly 
stated) and show that it is viable for some medium sized benchmarks. Fecht [15] 
describes another groundness analyser that is not coded in C. Fecht adopts ML 
as a coding medium in order to build an analyser that is declarative and easy to 
maintain. He uses a sophisticated fixpoint solver and his analysis times compare 
favourably with those of Van Hentenryck et al [21]. 

Codish and Demoen [8] describe a non-ground model based implementa- 
tion technique for Pos that would encode Xi O (cc2 A x^) as three tuples 
{true, true, true), {f alse, false), {false, false, f). Codish et al [9] propose a 
sub-domain of Def that can only propagate dependencies of the form {x\ O 
X2) A X3 across procedure boundaries. The main finding of Codish et al [9] is 
that this sub-domain loses only a small amount of precision for goal-dependent 
analysis. 

King et al [18] show how the equivalence checking, meet and join of Def can 
be efficiently computed with a Sharing quotient. Widening is required to keep 
the representation manageable. 

Finally, a curious connection exists between the join algorithm described in 
this paper and a relaxation that occurs in disjunctive constraint solving [14]. 
The relaxation computes the join (closure of the convex hull) of two polyhedra 
P\ and P 2 where Pi = {x G R" | AiX < Bi}. The join of Pi and P2 can be 
expressed as: 



P = 



|a? G R” 



Aipi{x) < Hi A A2P2{x) < B 2 a \ 
0<X<lAx = Xpi{x) -t- (1 — X)p 2 {x) J 



which amounts to the same tactic of constructing join in terms of meet (conjun- 
ction of linear equations), renaming (pi and P 2 ) and projection (the variables of 
interest are x). 



7 Future Work 

Initial profiling has suggested that a significant proportion of the analysis time is 
spent projecting onto (new) call and answer patterns, so recoding this operation 
might impact on the speed of the analysis. Also, a practical comparison with a 
DBCF analyser would be insightful. This is the immediate future work. In the 
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medium term, it would be interesting to apply widening to obtain an analysis 
with polynomial guarantees. Time complexity relates to the maximum number 
of iterations of a fixpoint analysis and this, in turn, depends on the length of the 
longest ascending chain in the underlying domain. For both Posx and Def x the 
longest chains have length 2” — 1 where |X| = n [18]. One way to accelerate the 
analysis, would be to widen call and answer patterns by discarding the formulae 
component of the GEP representation if the number of updates to a particular 
call or answer pattern exceeded, say, 8 [18]. The abstraction then corresponds to 
an EPosx function whose chain length is linear in X [9]. Although widening for 
space is not as critical as in [18], this too would be a direction for future work. In 
the long term, it would be interesting to apply Def to other dependency analysis 
problems, for example, strictness [13] and finiteness [5] analysis. 

The frequency analysis which has been used in this paper to tailor the costs 
of the abstract operations to the frequency with which they are called could be 
applied to other analyses, such as type, freeness or sharing analyses. 

8 Conclusions 

The representation and abstract operations for Def have been chosen by follo- 
wing a strategy. The strategy was to design an implementation so as to ensure 
that the most frequently called operations are the most lightweight. Previously 
unexploited computational properties of Def have been used to avoid expensive 
joins (and projections) through entailment checking; and to keep abstractions 
small by reformulating join in such a way as to avoid orthogonal reduced mono- 
tonic body form. The join algorithm has other applications such as computing 
the downward closure operator that arises in BDD-based set sharing analysis. 

By combining the techniques described in this paper, an analyser has been 
constructed that is precise, can be implemented easily in Prolog, and whose 
speed compares favourably with BDD-based analysers. 
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Abstract. Type specialisation, like partial evaluation, is an approach to 
specialising programs. But type specialisation works in a very different 
way, using a form of type inference. Previous articles have described the 
method and demonstrated its power as a program transformation, but its 
correctness has not previously been addressed. Indeed, it is not even clear 
what correctness should mean: type specialisation transforms programs 
to others with different types, so clearly cannot preserve semantics in the 
usual sense. 

In this paper we explain why finding a correctness proof was difficult, we 
motivate a correctness condition, and we prove that type specialisation 
satisfies it. Perhaps unsurprisingly, type-theoretic methods turned out to 
crack the nut. 



1 Introduction 

Type specialisation, like partial evaluation, is an approach to specialising pro- 
grams [13]. While partial evaluation focusses on specialising the control structu- 
res of a program, type specialisation focusses on transforming the datatypes. A 
type specialiser can produce programs operating on quite different types from 
the source program, and as a result achieve very strong specialisations. Earlier 
papers contain many illustrations of the power of the method [10,9,12,11,4]. 

However, these earlier papers do not address the correctness of the method: 
are the programs which type specialisation produces equivalent to those they 
are derived from? This question is harder to answer for type specialisation than 
for partial evaluation for two reasons. Firstly, since the type specialiser changes 
types, it is not even clear what ‘equivalent’ means. Secondly, for the most part, 
a partial evaluator applies a sequence of small semantics preserving transfor- 
mations whose correctness is obvious, but the type specialiser is described by 
axiomatising the relation between source and residual programs in one go. Thus 
there is more scope for error. Indeed, it transpires that the type specialiser does 
not preserve semantics, but we are able to prove a weaker result which is ‘good 
enough’. 

In this paper, we present our proof of correctness. We shall begin by reviewing 
type specialisation, and explaining the problems which foiled our earlier attempts 
to find a proof. Then we explain what we actually prove, which is an analogue 
of subject reduction. Finally, we will present some of the cases of the proof in 
detail. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 215-229, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 
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2 What is Type Specialisation? 

Type specialisation transforms a typed source program into a typed residual 
program, and in constrast to partial evaluation, types play a major role during 
the transformation itself. Both source and residual programs are simply typed, 
but they are expressed in different languages, and their types are used for diffe- 
rent purposes. In Figure 1 we specify the syntax of terms and types for a small 
language we will study first. 



e ::= n | e -I- e 

I lift e 

I X I \x.e I e@e 

I fix e 



I n 

I X I \x.e' I e' e' 
I fix e' 



T int 
I int 



n 

int 

' , ' 
T ^ T 



Fig. 1 . Source and Residual Languages. 



The source language is a form of two-level A-calculus: constructions may come 
in two forms, static or dynamic, with the dynamic form indicated by underlining. 
Similarly, types may be either static or dynamic. In the figure, we consider only 
static integers (constants or additions), dynamic integers (formed by applying 
lift to static ones), and dynamic functions (A-expressions, dynamic application, 
and dynamic fix). The typing rules for this fragment are given in Figure 2. 



r \— n : int 

r \— d : int 
r |— ei -I- 62 : int 

r,x : T \— X : T 



r |- e : int T |- ei : n ^ T2 F |- 62 : n 

r |— lift e : int F |— ei@62 : T2 



r,x : Ti \- e : T2 
r |— Xx.e : n — >■ T2 



F |— e : T T 
r \— fix e : T 



Fig. 2 . Source Typing Rules. 



There are two subtleties here, however. Firstly, in contrast to other two- 
level A-calculi, we do not restrict the formation of two level types in any way. 
For example, we allow dynamic functions to take static values as arguments, 
and return static results, which is forbidden in the context of partial evaluation. 
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The reason is simply that the type specialiser is able to specialise such programs, 
while partial evaluators are not. Intuitions from other specialisers lead one astray 
here therefore: the reason that no restrictions on type formation are stated is 
not that I have forgotten them, but that there are indeed no restrictions. 

Secondly, we interpret the syntax of types co-inductively. That is, types may 
be infinite expressions conforming to this syntax. This is the way in which we 
handle recursive types: they are represented as their infinite unfolding, and no 
special construction for type recursion is required. This is particularly useful 
for residual types, since it allows the specialiser to synthesize recursive types 
freely. While recursive types correspond only to regular infinite types, we need 
no assumption of regularity in the proofs which follow. Recursive types are of 
little use in the fragment in the Figure, but when we later extend the language 
we consider they will of course play their usual useful role. 

The residual language is also a form of simply typed A-calculus, but with a 
rich type system in which types carry static information. Thus there is a residual 
type n for every integer n; a static integer expression in the source language 
specialises to a residual expression with such a type. All static information is 
expressed via residual types, and as a result need not be present in residual 
terms. This explains the residual term •, which stands for ‘no value’: we can 
specialise 2 + 2 for example to •, since the residual type (4) already tells us all 
we need to know about the result. Type specialisation produces many residual 
expressions of this sort, but they are easy to remove in a post-processor we call 
the ‘void eraser’. The typing rules for residual terms are given in Figure 3. 



r : n r \— n : int F, x : t' \— x ■. t 

r,x : t[ |— e : T2 F |— e'l ■■ t[ ^ F \— e'2 : t[ F |— e' : r' — » t' 

F |— \x.e '■ t[ ^ T2 F \— e'l e'2 ’■ T2 F \— fix e' : r' 

Fig. 3. Residual Typing Rules. 

Type specialisation is specified via a set of specialisation rules, analogous to 
typing rules. Specialisation rules let us infer specialisation judgements, of the 
form 

F |— e : T 

meaning that source expression e of type r specialises to residual expression e! 
of type t' . The context F contains assumptions about source variables, of the 
form 

X T ^ e' t' 

Notice that variables may specialise to any residual expression; they do not have 
to specialise to variables. 
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r \— n : int n 

r \— a : int ^ e'i ■. rii n = n\ + U2 
r \— e\ + €2 '■ int ^ : n 

r \— e : int ^ e' : n 
r |— lift e : int ^ n : int 



r, X : T ^ e' : t' \— X : T ^ e' : t' 



r, X : Ti ^ x' : t[ \— e : T2 ^ e' : T2 
r |— Xx.e : n — >■ T2 ^ \x' .e' : rj — >■ T2 



x' i FV{r) 



r \— ei'.Ti — >• T2 ^ e'l : n — >• T2 F \— €2 ■ t\ ^ e '2 : t[ 
F |— ei@e2 : T 2 ^ e'l 62 : T2 

F \— e : T T ^ e' : t' t' 

F |— fix e : r ^ fix e' : t' 



Fig. 4 . Specialisation Rules. 



The specialisation rules for the fragment we are considering here are given 
in Figure 4. Using these rules we can conclude, for example, that 

^ (Ax. lift (x+ 1))@2 : int ^ (Ax'. 3) • : int 

The 2 : int specialises to • : 2, which forces the type of x' to be 2. Consequently 
X + 1 : int specialises to • : 3, and the lift moves this static information back 
into the term, specialising to 3 : int. Void erasure in this case elides both • and 
Ax', resulting in just 3 as the final specialised program. 

Note that the residual type system is more restrictive than the source one, 
so that well-typed source programs may fail to specialise. For example, the term 

(A/./@2 -I- /@3)@(Ax.x -I- 1) 

cannot be specialised, because x would need to be assigned both residual types 
2 and 3. This is perfectly natural: when we introduce the possibility to specialise 
types, we also introduce the possibility to do so inconsistently at different points 
in the program. 

Using types to carry static information enables us to specialise more programs 
than a partial evaluator can. For example, 

|— (A/./@2)@(Ax.lift (x -I- 1)) : int ^ (A/'./' •) (Ax'. 3) : int 

where x' must have type 2 to match the call of /', and so the body of /' specialises 
to 3. Here we can specialise the body of Ax. lift (x -I- 1), even though it does 
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not appear in an application. A partial evaluator, such as A-MIX [6] or Similix 
[2], would need to contract at least the outer /3-redex in order to propagate a 
static argument to x, but since this is a dynamic /3-redex then this is forbidden; 
this program is not well annotated for partial evaluation, but causes the type 
specialiser no problems. In larger programs where it is important not to unfold 
certain function calls, then this capability gives the type specialiser substantially 
more power. 

There is much more to the type specialiser than this, but we will introduce 
further features later, along with their proofs of correctness. 

3 Why is Correctness Difficult? 

Of course, we would like to know that specialisation does not change the seman- 
tics of programs; residual programs should be equivalent to the source programs 
they were derived from. Yet we cannot hope to prove this for the type speciali- 
ser. The very essence of the type specialiser is that it changes types. The source 
and residual programs in general have quite different types, and so they lie in 
different semantic domains: we certainly cannot expect them to be equal. For 
example, 42 specialises to •, and of course these are different. 

However, we note that dynamic type constructors always specialise to one- 
level versions of themselves — in our fragment this refers to int and — >■. Thus, if 
the type of an expression involves only these constructors, then it will specialise 
to a residual expression with an isomorphic type. Thus we might hope to prove 
equivalence in this case. 

Unfortunately, it doesn’t hold. Consider the source term lift (fix (Aa;.a;)), 
which clearly denotes T. If we assume x : int ^ x' : 42, then we can specialise 
\x.x to Xx' .x' : 42 — >• 42, and so specialise the fixpoint to a term with type 42. 
Now the rule for lift lets us specialise the entire term to 42 : int, which is clearly 
not equivalent to the source expression. In this case the implemented specialiser 
would not actually choose this specialisation, but we can force it to exhibit 
similar behaviour by supplying slightly more complex terms. For example, 

lift (fix (Aa;.if true then x else 42)) 
specialises to 42, but denotes T. 

Instead of equivalence, therefore, we will aim to prove that the source term 
approximates the residual one. That is, the type specialiser may transform non- 
terminating programs into terminating ones, but it will never transform a ter- 
minating program into one which produces a different answer. Many program 
transformations behave similarly, so we will consider this weaker correctness 
property to be acceptable. 

4 Outline of the Proof 

Since type specialisation is modelled closely on type inference, it is perhaps not 
so surprising that type theoretic methods turn out to be useful. We will prove 
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the correctness of the specialiser by showing a kind of subject reduction result. 
We will define source and residual reduction relations, both of which we write 
as — >■, and then we will prove 

Theorem (Simulation). If T ^ ci : r : r' and ei — >■ 62 , then there 

exists an such that T |— 62 : r : r' and — >■* 62 . 

By this theorem we know that if e\ eventually reduces to a value, then reduces 
to the specialisation of a value. By 

Lemma (Value Specialisation). If T |— v : r ^ e' : r' (where u is a 
source value), then e' is a residual value. 

then the following correctness theorem follows: 

Theorem (Correctness). If T |— e : r e' : t' and e reduces to a value, 
then so does e'. 

In order to prove the Simulation theorem, then we will need two lemmata 
about substitution — two, because we have two kinds of variables, and therefore 
two kinds of substitution. The lemma for source substitution is 

Lemma (Source Substitution). If F \— e\ : t\ e\ : r( and F,x 
e'l : t[ |- 62 : T 2 ^ e '2 : T 2 , then F |- 62 [ei/x] : T 2 e '2 : r^. 

No substitution is required into the residual term, because specialisation itself 
substitutes e'^ for x. 

The residual subsitution lemma is even simpler. 

Lemma (Residual Substitution) . Let 6 * be a substition of residual terms 
for the residual variables occurring free in T. If T |— e : r e' : r', then 
T 0 |— 6 : r ^ e '0 : r'. 

We prove both these lemmata, and the Simulation theorem, by induction over 
the structure of source terms. In the next section we present the proofs for the 
fragment we are currently consideration, and then in later sections we show the 
cases for extensions to this fragment. 

5 The Correctness of the Fragment 

Before we go further we must define reduction relations for the source and tar- 
get languages. We do so in Figure 5; the reduction relations are the smallest 
congruences satisfying the stated properties. By a value we mean a closed weak 
head normal form: the values in the source language take the form n, lift n, or 
Xx.e, while the values in the residual language take the form •, n or Xx.e' . The 
Value Specialisation lemma now follows directly, by applying the appropriate 
specialisation rule to each form of source value. We now prove the substitution 
lemmata and the Simulation theorem in turn. 
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ni + U2 — >■ n if n = ni + n-2 

(Aa:.ei)@e2 ei[e2/a;] (Ax.e'i) e'2 e'i[e^a;] 

fix e —> e@(fix e) fix e' —> e' (fix e') 



Fig. 5. Source and Residual Reduction Rules. 



Proof of the Source Substitution Lemma. We are to prove that if T |— ei : ri ^ 
e'l : and P,x \ t \ ^ : r( K 62 : T2 62 : r^, then P \— 62(61 /a;] : T2 62 : 

T2 . The proof is by induction over the syntax of 62 . The only interesting case is 
that for variables. For the variable x, we must show that 

r \— x[ei/x\ : T2 ^ e'2 : T2 

But from the second assumption, we know that 

r, X : Ti e'l : |— X : T 2 62 : T2 

Consulting the specialisation rule for variables, it follows that e[ and 63 are the 
same, as are ti and T2, and t[ and r^. Since by the first assumption, 

T ^ 6i : Ti ^ e'l : t[ 

then the result follows. For other variables, the proof is trivial. 



Proof of the Residual Substitution Lemma. We are to prove that if 0 is a sub- 
stition of residual terms for residual variables, and T |— e : r e' : r', then 
P9 |— 6 : T e' 9 : r'. Once again the proof is by induction on the syntax of e. 
We will prove the cases for variables and A-expressions, since these are the only 
rules that can introduce residual variables into the residual term. 

For a variable x, we assume that T |— x : r e' : r', which by the speciali- 
sation rule for variables means that P must contain an assumption of the form 
X : T ^ e' : t'. P9 therefore contains the assumption x : t ^ e'9 : r', and it 
follows that P9 \- X : T ^ e'9 : P as required. 

For a A-expression Ax.e, we know that its specialisation uses the rule 



P, X : Ti ^ x' : t[ \— e : T2 ^ e' : T2 
P ^ Ax. 6 : Ti — >■ T2 ^ Ax'.e' : — >• 



x' i FV{P) 



Since x' is not free in P, it cannot be renamed by 9, so we may conclude by the 
induction hypothesis that 

P9, X x' :t[ \— e : T2 e' 9 : T2 



Applying the specialisation rule for A again, we derive 



P9 |— Ax.e : Ti — >■ T2 (Ax'.e')0 : — >■ T2 



as required. 
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Proof of the Simulation Theorem . We are to prove that if |— ei : r : t' 
and ei — >■ 62, then there exists an such that T \- €2 '■ t e'2 '■ t' and 62 — >■* e^. 
This proof is also by induction on the syntax of Ci, and we will present it in some 
detail. 



— Case n. Trivial, since n does not reduce to anything. 

— Case ei + 62. According to the specialisation rule for +, we have 

r \— Oi \ int ^ e[ : Hi n = n\ + U2 
r \— e\+ C2 ■ int ^ : n 

Suppose first that e\ and 62 are both values. Since 

r \— e\ ■. int ^ e'l : ni 

then Cl must be ni, and similarly for 62. It follows that ei + 62 — >■ n, which 
specialises to • : n. It remains to show that • — >■* •, which it does in zero 
steps. 

Alternatively, suppose without loss of generality that ei + 62 — >■ 63 + 62 by 
reducing Ci —>-63. Then by the induction hypothesis, there is an Cg such that 
e'l —>-*63 and 

r |— eg : int ^ 63 : rii 
Applying the specialisation rule for +, we derive 

T |— eg + 62 : int ^ : n 

and it remains only to show •—>■*• as before. 

— Case lift e. We have lift e — lift eo, and 



r ^ e : int ^ e' : n 

r ^ lift e : int ^ n : int 

We have e — >■ eo, and so by the induction hypothesis there is an eg such that 
e' — >■* eg and T |— eo : int ^ e'^: n. It follows that 

r |— lift eg : int ^ n : int 

and since n n then the proof is complete. 

— Case X. Trivial since there is no reduction rule for variables. 

— Case \x.e. We have Xx.e — >■ Ax. eg, and 



T, X : Ti ^ x' : ^ e : T2 e' : T2 

r ^ Xx.e : Ti — >■ T2 Ax'.e' : — >■ T2 



x' i FV{r) 



So e — >■ eg, and by the induction hypothesis there is an eg such that e' — >■* eg 
and 

r. , / / I . r r 

1 ,x : Ti ^ X : Tj |— eg : T2 ^ eg : T2 

It follows that 



r ^ Ax. eg : Ti — >■ T2 Ax'. eg : — >■ T2 

and Ax'.e' — >■* Ax'. eg as required. 
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— Case ei@e2- An application can be reduced in three different ways: a reduc- 
tion may be made inside ei, or inside 62, or the application itself may be a 
/ 3 -redex which is reduced. The first two cases are proved in the same way as 
the A case above, so we consider only the third. Suppose therefore that Ci is 
Xx.e. Combining the specialisation rules for A and we obtain 



r,x : Ti ^ x' : t[ |— e : T2 ^ e' : T2 
r \— Xx.e : Ti — >• T2 Xx'.e' : — >■ T2 



r \- €2 ■■ Ti ^ e'2 ■■ t[ 



r \— {Xx.e)@e2 : T2 (Ax'.e') 

Substituting e'2 for x' using the Residual Substitution lemma, we know that 
r,x : T\ ^ e'2 '■ t[ |— e : T2 ^ e'[e2/x'] : T2 
Now by the Source Substitution lemma, we have 

r \— e[e2/x] : T 2 ^ e'[e'2/a;'] : T 2 

Since (Ax.e)@e2 — >■ e[e2/x] and (Ax'.e') — >■ e'[c2/a:'], then the proof of this 

case is complete. 

Case fix e. This case is similar to application, and is omitted. 

This completes the proof of the Simulation theorem for the fragment. 



6 Extensions 

The tiny language we have considered so far illustrates only the basics of type 
specialisation: it consists only of dynamic A-calculus plus one kind of static 
information. In reality the type specialiser accepts a much richer language. In 
this section we discuss some of the extensions, and their proofs of correctness. 



6.1 Enriching the Dynamic Language 

In addition to dynamic function types with dynamic A-expressions and appli- 
cations, the type specialiser supports dynamic product types with tuples and 
selectors, dynamic tagged sum types with constructor application and a case 
expression, dynamic let expressions and conditionals. In each case we add a dy- 
namic version of each construct to the source language, and a residual version 
to the residual language. The new reduction rules in the source and residual 
language correspond. Each dynamic construct specialises to its corresponding 
residual construct, with specialised sub-expressions. The substitution lemmata 
extend easily, and the proofs of the Simulation theorem all take the same form: 
a reduction in a sub-expression is simulated by reductions in the corresponding 
residual sub-expression, while a reduction using a new source reduction rule is 
simulated using the corresponding new residual reduction rule. The proofs are 
modelled on those for Xx.e and ei@e2. 
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6.2 Static Tagged Sums 

One of the most interesting applications of the type specialiser is to remove type 
tags when specialising an interpreter for a typed language. If such an interpreter 
represents values using a universal type which is a tagged sum of the differently 
typed alternatives, then the type specialiser can remove the tags, specialising 
the universal type to an appropriate representation type at each use. To express 
this, we must add static tagged sum types to our source language. We extend 
the syntax of types and expressions as follows, where C is a tag, or ‘constructor’: 

r ::= UJL, C r 
e ::= C e 

I case e of{C a: e}”^i end 

Since the tags are static, the corresponding residual types must record which 
constructor was actually applied. Thus we extend residual types as follows: 

t' ::= C t' 

There is no need to extend the language of residual terms, since application and 
inspection of static constructors will be specialised away. 

The specialisation rule for a constructor application just records the con- 
structor in the residual type, 

T |- e : Tfc e' : Tfc 

r |- Cfc e : Ci Ti^ e' : Ck 

while the rule for a case expression uses the statically-known constructor to 
choose the corresponding branch: 

r \- e: r’bi Ci Ti^ e' : Ck 
r, Xk-Tk^ e' : r'f. |- Cfc : tq ej, : Tq 
r |— case e oi{Ci Xi — >■ end : tq ^ ej. : Tq 

The Source and Residual substitution lemmata extend easily to these cases. 
There is one new source reduction rule, namely 

case Ck e of{Ci Xi — >■ ei}jb;^end — >• ek[e/xk] 

and one new form of source value: C v. Notice that in order to prove the Value 
Specialisation lemma, we must require the argument of the constructor to be 
evaluated. 

We will prove just the case in the Simulation theorem when the new reduction 
rule is applied. Thus we must prove that if 

r \— case Ck e of {Ci Xi — >■ ei}”^;^ end : tq ^ e'k '■ Tq 

then there is an e" such that ej. — >■* e" and 

r h ek[e/xk\ : tq e" : Tq 
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We shall take e" to be just ej., and argue that from the assumption we know 
that 

r, Xk'.Tk^ e' : Tfc |- Cfc : To ej, : Tq 

where 

r ^ e : Tfc e' : Tfc 

By the Source Substitution lemma, it follows that F \— efe[e/a:fc] : tq ej. : Tq 
as required. 

6.3 Polyvariance 

All interesting program specialisers are polyvariant, that is, they can specialise 
one expression in the source code multiple times. Polyvariance is provided in the 
type specialiser by extending the source and residual languages as follows: 

e poly e | spec e e' ::= (e', . . . , e') | e' 

r::=polyT r' ::= (r', . . . , r') 

The idea is that poly e can be specialised to a tuple of specialisations of e, 
from which spec e chooses an element . The residual type of such a tuple records 
which specialisations it contains. We add reduction rules 

spec (poly e) e Wk {e[, . . . ,e'^) ^ e'^ 

and new source values poly e, and residual values (e(, . . . , e'„). 

The specialisation rules for these constructions are: 

T |— e : T e' : r' , i = 1 . . . n 
r |- poly e : poly t ^ {e[, . . . , e'J : , r^) 

r\- e: poly r ^ e' : (t(, . . . , 
r ^ spec e : T ^ TTk e' : rj. 

The proofs of the substitution lemmata and the Simulation theorem go through 
easily for this extension. For the Simulation theorem, a reduction poly ei — >■ 
poly 62 by 6i — >■ 62 can be simulated by reductions in each specialisation, while 
the reduction spec (poly e) — >■ e is simulated by tta, {e[, . . . , e'^) — >■ cj,. 

6.4 Static Functions 

All interesting specialisers provide static functions, that is, functions which are 
unfolded at specialisation time. So, too, does the type specialiser, via static A- 
expressions and static applications. The specialisation rule for static A given in 
[10] is 

{x, : Ti -T e' : t'} h 

Xx.e : Ta ^ Tb ^ {e[, . . . , e(j) : close {xi : n ^ t' : i} in Xx.e 
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That is, a static function is represented in the residual program by a tuple of 
its free variables, and the residual type carries the information needed to unfold 
/3-redexes. 

Unfortunately, this specialisation rule violates the theorems we are trying 
to prove. The Simulation Theorem fails because a reduction under a static A 
changes the residual type of the specialisation; since residual reduction does not 
change types, it is impossible for the result of specialisation before the reduction 
to reduce to the result of specialisation afterwards. The Source Substitution 
lemma also fails, because a static A specialises to a tuple with one element 
per free variable; substituting for one of these variables changes the size of the 
tuple. The problem here is that type specialisation essentially performs a closure 
conversion, and closure conversion is not preserved by substitution. 

Our solution is to instead consider specialisation of closure-converted pro- 
grams. Thus we prove the correctness of a variant of the type specialisation 
previously described. We extend the source and target languages as follows, 

e ::= close {x = e}* in Xx.e \ e@e e! ::= (e', . . . , e') | tta, e! 

T ::= T ^ T t' ::= close {x t ^ r'}* in Xx.e 

with the restriction that all the free variables of Xx.e must be bound in the 
associated definitions. Closures and residual tuples are both values. 

We add a [3 reduction rule to the source language, 

(close {xi = 6i} in Xx.e)@6x — >■ e[ei/a:i, 

and a rule for reducing projections to the residual language. Moreover, we forbid 
reduction of the body of a static A - the only reductions of closures take place 
in the bindings {xi = Ci}. 

The specialisation rules for static closures and applications are 

T |— Cj : Ti e' : r', t G 1 . . . n 
r \— close {xi = 6i} in Xx.e : ti — >■ T 2 

{e'l,. . . , e(j) : close {xi : Ti r'} in Ax.e 

r \— ef : Tx ^ Ty ^ e'f : close {xi : ^ r'} in Xx.e 

r\- ex -Tx^ e'x : r' 

{x, -.Tj^TTi e'f : r'}, x : ^ : r' |- e : ^ e' : r' 

r |- ef@ex : Ty e' : Ty 

With these definitions, the substitution and value specialisation lemmata 
are easily proved. To prove the Simulation Theorem we must introduce another 
(easily proved) lemma: 

Lemma (Reduction in Context Lemma) Let /2 be obtained from Ti by 

making a reduction in one of the residual expressions. If Ti |— e : r 

e'l : r', then there exists such that e\ e '2 and I 2 |— e : r : r'. 

The proof of the Simulation Theorem now goes through. 
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7 Related Work 

In 1991, Gomard and Jones described A-MIX, the first self-applicable partial 
evaluator for the A-calculus, which was so simple that much later work was 
based on it. Gomard proved the correctness of the partial evaluator, that is, that 
source and residual programs denote the same values [6] . The proof is based on 
establishing a logical relation between the denotation of a two-level source term, 
and the denotation of its one-level erasure. A-MIX was the first partial evaluator 
whose binding-time analysis was expressed as a type system, and the logical 
relation is indexed by binding-time types. 

Gomard’s original proof was somewhat flawed by an informal treatment of 
fresh name generation. Moggi pointed this out, and gave a rigorous proof based 
on an alternative semantics for A-MIX using functor categories [15]. Using related 
techniques, Filinksi has proved the soundness and completeness [5] of Danvy’s 
type-directed partial evaluation (TDPE) [3]. 

These proofs have in common that they are based on establishing logical 
relations between denotational semantics of source and residual terms. This is 
essentially the approach we first tried to follow to show the correctness of the 
type specialiser. But since A-MIX and TDPE do not transform types, the logical 
relations are simpler to define, and since neither allows recursive binding-time 
types, the problems they cause with well-definedness of logical relations do not 
arise. (Recursive types are not really needed in A-MIX, since dynamic computa- 
tions are essentially untyped). 

Other recent work on the correctness of partial evaluators has focussed on 
the correctness of binding-time analysis, rather than on specialisation proper. 

A closer analogy can be found with other recent work on type-directed trans- 
formations. John Hannan and Patrick Hicks have published a series of papers in 
which they present such transformations of higher order languages, for example 
[7,8]. Just like type specialisation, these transformations are specified by infe- 
rence rules, whose judgements relate a source term, a transformed term, and a 
type in an extended type language specifying how the former should be trans- 
formed into the latter. Proofs of correctness are outlined, and are quite similar 
to our own: source and target languages are given an operational semantics, and 
there is an analogue of our Simulation Theorem relating the two. Hannan and 
Hicks also prove that every well-typed source term can be transformed to a tar- 
get term, which is of course untrue for type specialisation, and that reductions 
of target terms can be simulated by the corresponding source terms. 

8 Discussion and Conclusions 

A first attempt to find a proof was based on giving a denotational semantics 
to source and target languages, and establishing a logical relation indexed by 
residual types between them. But this foundered when the relation proved to 
be ill-defined. The problem is that residual types may involve arbitrary type 
recursion under function arrows. A recursive type leads to a recursively defined 
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logical relation, which only makes sense if the recursive definition has a least 
fixed point. But since the formation of logical relations on function types is 
antimonotonic in the left argument, then the usual monotonicity argument that 
a least fixed point exists does not apply. 

It is possible that this approach might succeed even so. We could try to define 
a metric on relations, and show that the recursive definitions we are interested in 
are contractive, just as MacQueen, Plotkin and Sethi did to show that recursive 
types could be modelled by ideals [14]. But this would at best lead to a very 
technical proof, dependent on the detailed structure of the underlying semantic 
domains. Instead, we chose to pursue the more operational approach described 
in this paper. 

The proof we have presented is pleasingly simple, and we have some hope 
that the proof method will be robust to extensions of the type specialiser, not 
least since similar methods have been used successfully to prove the correctness 
of other type-directed transformations. The operational approach, inspired by 
subject reduction, proved to be much easier to carry through the denotationally- 
based attempt. And of course, it is pleasing to know that type specialisation 
actually is correct. 

The proof does raise other questions, though. For example, earlier papers were 
vague on whether the intended semantics of the object language was call-by-value 
or call-by-name. In this paper we explicitly give it a call- by-name semantics. Is 
type specialisation correct for a call-by-value language? One would hope that 
a similar proof would go through, but the most obvious idea of restricting (3- 
reduction to /?„ redexes does not seem to work easily. Another interesting idea 
would be to consider call- by-need reduction rules [1]: perhaps one could show 
thereby that specialisation (of a suitably restricted language) does not duplicate 
computations. 

We have also focussed here on the relationship between source terms and 
residual terms - the dynamic part of the specialisation. Residual types in contrast 
play only a small role here. Yet we might also hope to be able to relate them to 
the source program. Residual types purport to carry static information about the 
source term they are derived from: in a sense they can be regarded as properties 
of source terms. For example, if ^ / : int — >-int ^ : 42 — >■ 44, then we would 

expect that / maps 42 to 44. Another interesting avenue would be to assign a 
semantics to residual types as properties, and prove that specialisation produces 
properties that really hold. 
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Abstract. Type classes in Haskell allow programmers to define func- 
tions that can be used on a set of different types, with a potentially dif- 
ferent implementation in each case. For example, type classes are used to 
support equality and numeric types, and for monadic programming. A 
commonly requested extension to support ‘multiple parameters’ allows a 
more general interpretation of classes as relations on types, and has many 
potentially useful applications. Unfortunately, many of these examples 
do not work well in practice, leading to ambiguities and inaccuracies in 
inferred types and delaying the detection of type errors. 

This paper illustrates the kind of problems that can occur with multi- 
ple parameter type classes, and explains how they can be resolved by 
allowing programmers to specify explicit dependencies between the pa- 
rameters. A particular novelty of this paper is the application of ideas 
from the theory of relational databases to the design of type systems. 



1 Introduction 

Type classes in Haskell [11] allow programmers to define functions that can be 
used on a set of different types, with a potentially different implementation in 
each case. Each class represents a set of types, and is associated with a particular 
set of member functions. For example, the type class Eq represents the set of all 
equality types, which is precisely the set of types on which the (==) operator 
can be used. Similarly, the type class Num represents the set of all numeric 
types — including Int, Float, complex and rational numbers — on which standard 
arithmetic operations like (-I-) and (— ) can be used. These and several other 
classes are defined in the standard Haskell prelude and libraries [11, 12]. The 
language also allows programmers to define new classes or to extend existing 
classes to include new, user-defined datatypes. As such, type classes play an 
important role in many Haskell programs, both directly through uses of the 
member functions associated with a particular class, and indirectly in the use of 
various language constructs including a special syntax for monadic programming 
(the do-notation). 

’’’ The research reported in this paper was supported by the USAF Air Materiel Com- 
mand, contract if F19628-96-C-0161. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 230-244, 2000. 
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The use of type classes is reflected by allowing types to include predicates. 
For example, the type of the equality operator is written: 

(==) :: Eq a ^ a ^ a ^ Bool 

The type variable a used here represents an arbitrary type (bound by an implicit 
universal quantifier), but the predicate Eq a then restricts the possible choices 
for a to types that are in Eq. More generally, functions in Haskell have types of 
the form P ^ t, where P is some list of predicates and r is a monotype. If P 
is empty, then we usually abbreviate P r as r. In most implementations, the 
presence of a predicate in a function’s type indicates that an implicit parameter 
will be added to pass some appropriate evidence for that predicate at run-time. 
For example, we might use an implementation of equality on values of type a as 
evidence for a predicate of the form Eq a. Details of this implementation scheme 
may be found elsewhere [14]. 

In a predicate such as Eq a, we refer to Eq as the class name, and to a as the 
class parameter. Were it not for the use of a restricted character set, constraints 
like this might instead have been written in the form a G Eq, reflecting an in- 
tuition that Eq represents a set of types of which a is expected to be a member. 
The Haskell syntax, however, which looks more like a curried function applica- 
tion, suggests that it might be possible to allow classes to have more than one 
parameter. For example, what might a predicate of the form R a b mean, where 
two parameters a and b have been provided? The obvious answer is to interpret 
R as & two-place relation between types, and to read R a b as the assertion that 
a and b are related by R. This is a natural generalization of the one parameter 
case because sets are just one-place relations. More generally, we can interpret 
an n parameter class by an n-place relation on types. 

One potential application for multiple parameter type classes was suggested 
(but not pursued) by Wadler and Blott in the paper where type classes were first 
described [14]. The essence of their example was to use a two parameter class 
Coerce to describe a subtyping relation, with an associated coercion operator: 

coerce :: Coerce a b ^ a ^ b. 

In the decade since that paper was published, many other applications for mul- 
tiple parameter type classes have been discovered [13]; we will see some of these 
in later sections of the current paper. The technical foundations for multiple 
parameter classes have also been worked out during that time, and support for 
multiple parameter type classes is now included in some of the currently available 
Haskell implementations. So it is perhaps surprising that support for multiple 
parameter type classes is still not included in the Haskell standard, even in the 
most recent revision [11]. One explanation for this reticence is that some of 
the proposed applications have not worked particularly well in practice. These 
problems often occur because the relations on types that we can specify using 
simple extensions of Haskell are too general for practical applications. In par- 
ticular, they fail to capture important dependencies between parameters. More 
concretely, the use of multiple parameter classes can often result in ambiguities 
and inaccuracies in inferred types, and in delayed detection of type errors. 
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In this paper, we show that many of these problems can be avoided by gi- 
ving programmers an opportunity to specify the desired relations on types more 
precisely. The key idea is to allow the definitions of type classes to be annotated 
with functional dependencies — an idea that originates in the theory of relational 
databases. In Section 2, we describe the key features of Haskell type classes that 
will be needed to understand the contributions of this paper. In Section 3, we 
use the design of a simple library of collection types to illustrate the problems 
that can occur with multiple parameter classes, and to motivate the introduction 
of functional dependencies. Further examples are provided in Section 4. Basic 
elements of the theory of functional dependencies are presented in Section 5, and 
are used to explain their role during type inference in Section 6. In Section 7, 
we describe some further opportunities for using dependency information, and 
then we conclude with some pointers to future work in Section 8. 

2 Preliminaries: Type Classes in Haskell 

This section describes the class declarations that are used to introduce new 
(single parameter) type classes in Haskell, and the instance declarations that 
are used to populate them. Readers who are already familiar with these as- 
pects of Haskell should probably skip ahead to the next section. Those requi- 
ring more than the brief overview given here should refer to the Haskell re- 
port [11] or to the various tutorials and references listed on the Haskell website 
at http : //haskell . org. 

Class Declarations: A class declaration specifies the name for a class and lists 
the member functions that each type in the class is expected to support. The 
actual types in each class — which are normally referred to as the instances of the 
class — are described using separate declarations, as will be described below. For 
example, an Eq class, representing the set of equality types, might be introduced 
by the following declaration: 

class Eq a where 

(==) :: a — >■ o — >■ Bool 

The type variable a that appears in both lines here represents an arbitrary in- 
stance of the class. The intended reading of the declaration is that, if a is a par- 
ticular instance of Eq, then we can use the (==) operator at type a ^ a ^ Bool 
to compare values of type a. 

Qualified Types: As we have already indicated, the restriction on the use of the 
equality operator is reflected in the type that is assigned to it: 

(==) :: Eq a ^ a ^ a ^ Bool 

Types that are restricted by a predicate like this are referred to as qualified 
types [4]. Such types will be assigned to any function that makes either direct 
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or indirect use of the member functions of a class at some unspecified type. For 
example, the functions: 

member x xs = any {x ==) xs 

subset xs ys = all i\x — >■ member x ys) xs 

will be assigned types: 

member :: Eq a ^ a ^ [a] ^ Bool 
subset :: Eq a ^ [a] — >■ [a] — >■ Bool. 

Superclasses: Classes may be arranged in a hierarchy, and may have multiple 
member functions. The following example illustrates both with a declaration of 
the Ord class, which contains the types whose elements can be ordered using 
strict (<) and non-strict (<=) comparison operators: 

class Eq a ^ Ord a where 
(<), (<=) a ^ a ^ Bool 

In this particular context, the symbol should not be read as implication; in 
fact reverse implication would be a more accurate reading, the intention being 
that every instance of Ord is also an instance of Eq. Thus Eq plays the role of a 
superclass of Ord. This mechanism allows the programmer to specify an expected 
relationship between classes: it is the compiler’s responsibility to ensure that this 
property is satisfied, or to produce an error diagnostic if it is not. 

Instance Declarations: The instances of any given class are described by a collec- 
tion of instance declarations. For example, the following declarations show how 
one might define equality for booleans, and for pairs: 

instance Eq Bool where 

X == y = \i X then y else not y 

instance {Eq a, Eq b) Eq {a, b) where 
{x, y) == {u, v) = {x == u kk y == v) 

The first line of the second instance declaration tells us that an equality on values 
of types a and b is needed to provide an equality on pairs of type (a, 6). No such 
preconditions are need for the definition of equality on booleans. Even with just 
these two declarations, we have already specified an equality operation on the 
infinite family of types that can be constructed from Bool by repeated uses of 
pairing. Additional declarations, which may be distributed over many modules, 
can be used to extend the class to include other datatypes. 

3 Example: Building a Library of Collection Types 

One of the most commonly suggested applications for multiple parameter type 
classes is to provide uniform interfaces to a wide range of collection types [10]. 
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Such types might be expected to offer ways to construct empty collections, to ins- 
ert values, to test for membership, and so on. The following declaration, greatly 
simplified for the purposes of presentation, introduces a two parameter class 
Collects that could be used as the starting point for such a project: 

class Collects e ce where 
empty :: ce 
insert :: e — >■ ce — >■ ce 
member :: e — >■ ce — >■ Bool 

The type variable e used here represents the element type, while ce is the type 
of the collection itself. Within this framework, we might want to define instances 
of this class for lists or characteristic functions (both of which can be used to 
represent collections of any equality type), bit sets (which can be used to repre- 
sent collections of characters), or hash tables (which can be used to represent 
any collection whose elements have a hash function). Omitting standard imple- 
mentation details, this would lead to the following declarations: 

instance Eq e ^ Collects e [e] where . . . 
instance Eq e ^ Collects e (e — >■ Bool) where . . . 
instance Collects Char BitSet where . . . 
instance [Hashable e. Collects e ce) 

Collects e [Array Int ce) where . . . 

All this looks quite promising; we have a class and a range of interesting im- 
plementations. Unfortunately, there are some serious problems with the class 
declaration. First, the empty function has an ambiguous type: 

empty :: Collects e ce => ce. 

By ‘ambiguous’ we mean that there is a type variable e that appears on the left 
of the symbol, but not on the right. The problem with this is that, according 
to the theoretical foundations of Haskell overloading, we cannot guarantee a well- 
defined semantics for any term with an ambiguous type [2, 4]. For this reason, 
a Haskell system will reject any attempt to define or use such terms. 

We can sidestep this specific problem by removing the empty member from 
the class declaration. However, although the remaining members, insert and 
member, do not have ambiguous types, we still run into problems when we try 
to use them. For example, consider the following two functions: 

f X y coll = insert x [insert y coll) 
g coll =f True 'a' coll 

for which Hugs infers the following types: 

/ :: [Collects a c, Collects b c)^a^b^c^c 
g :: [Collects Bool c, Collects Char c) c — >■ c. 

Notice that the type for / allows the parameters x and y to be assigned different 
types, even though it attempts to insert each of the two values, one after the 
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other, into the same collection, coll. If we hope to model collections that contain 
only one type of value, then this is clearly an inaccurate type. Worse still, the 
definition for g is accepted, without causing a type error. Thus the error in this 
code will not be detected at the point of definition, but only at the point of 
use, which might not even be in the same module. Obviously, we would prefer 
to avoid these problems, eliminating ambiguities, inferring more accurate types, 
and providing earlier detection of type errors. 



3.1 An Attempt to Use Constructor Classes 

Faced with the problems described above, some Haskell programmers might be 
tempted to use something like the following version of the class declaration: 

class Collects e c where 
empty wee 
insert :: e — >■ c e — >■ c e 
member :: e — >■ c e — >■ Bool 

In fact this is precisely the approach taken by Okasaki [9], and by Peyton Jo- 
nes [10], in more realistic attempts to build this kind of library. The key diffe- 
rence here is that we abstract over the type constructor c that is used to form 
the collection type c e, and not over that collection type itself, represented by 
ce in the original class declaration. Thus Collects is an example of a constructor 
class [6] in which the second parameter is a unary type constructor, replacing the 
nullary type parameter ce that was used in the original definition. This change 
avoids the immediate problems that we mentioned above: 

— The empty operator has type Collects e c ^ c e, which is not ambiguous 
because both e and c appear on the right of the symbol. 

— The function / is assigned a more accurate type: 

/ :: {Collects ec)=^e— >-e— >-ce— >-ce. 

— The function g is now rejected, as required, with a type error because the 
type of / does not allow the two arguments to have different types. 

This, then, is an example of a multiple parameter class that does actually work 
quite well in practice, without ambiguity problems. The reason that it works, 
at least intuitively, is that its two parameters are essentially independent of 
one another and so there is a good fit with the interpretation of Collects as a 
relatively unconstrained relation between types e and type constructors c. 

Unfortunately, this version of the Collects class is not as general as the ori- 
ginal class seemed to be. Only one of the four instances listed in Section 3 can 
be used with this version of Collects because only one of them — the instance 
for lists — has a collection type that can be written in the form c e, for some 
type constructor c, and element type e. Some of the remaining instances can be 
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reworked to fit the constructor class framework by introducing dummy type and 
value constructors, as in the following example: 

newtype Char Fun e = MkCharFun (e —>■ Bool) 
instance Eq e ^ Collects e CharFun where . . . 

This approach, however, is not particularly attractive. It clutters up programs 
with the artificial type constructor CharFun, and with uses of the value construc- 
tor MkCharFun to convert between the two distinct but equivalent representa- 
tions of characteristic functions. The workaround is also limited, and cannot, in 
general, deal with cases like the BitSet example, where the element type is fixed 
and not a variable e that we can abstract over. 

3.2 Using Parametric Type Classes 

Another alternative is to use parametric type classes [3] (PTC), with predicates 
of the form ce G Collects e, meaning that ce is a member of the class Collects e. 
Intuitively, there is one type class Collects e for each choice of the e parameter. 
The definition of a parametric Collects class looks much like the original: 

class ce G Collects e where 
empty :: ce 
insert :: e — >■ ce — >■ ce 
member :: e ce Bool 

All of the instances declarations that we gave for the original Collects class in 
Section 3 can be adapted to the syntax of PTC, without introducing artificial 
type constructors. What makes it different from the two parameter class in Sec- 
tion 3 is the implied assumption that the element type e is uniquely determined 
by the collection type ce. A compiler that supports PTC must ensure that the 
declared instances of Collects do not violate this property. In return, it can use 
this information to avoid ambiguity and to infer more accurate types. For ex- 
ample, the type of empty is now Vc,cc.(cc G Collects c) ce, and we do not 
need to treat this as being ambiguous because the unknown element type c is 
uniquely determined by ce. 

Thus, PTC provides exactly the tools that we need to define and work with 
a library of collection classes. In our opinion, the original work on PTC has not 
received the attention that it deserves. In part, this may be because it was seen, 
incorrectly, as an alternative to constructor classes and not, more accurately, 
as an orthogonal extension. In addition, there has never been even a prototype 
implementation for potential users to experiment with. 

3.3 Using Functional Dependencies 

In this paper, we describe a generalization of parametric type classes that allows 
programmers to declare explicit functional dependencies between the parameters 
of a predicate. For example, we can achieve the same effects as PTC, with no 
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further changes in notation, by annotating the original class definition with a 
dependency ce e, to be read as “ce uniquely determines e.” 

class Collects e ce \ ce e where 
empty :: ce 
insert :: e — >■ ce — >■ ce 
member :: e — >■ ce — >■ Bool 

More generally, we allow class declarations to be annotated with (zero or more) 
dependencies of the form (zi, . . . , Xn) {yi, • ■ • , ym), where Xi, . . . , a;„, and yi, 
. . . , ym are type variables and m,n > 0^. Such a dependency is interpreted as an 
assertion that the y parameters are uniquely determined by the x parameters. 
Dependencies appear only in class declarations, and not in any other part of the 
language: the syntax for instance declarations, class constraints, and types is 
completely unchanged. For convenience, we allow the parentheses around a list 
of type variables in a dependency to be omitted if only a single variable is used. 

This approach is strictly more general than PTC because it allows us to 
express a larger class of dependencies, including mutual dependencies such as 
{a b, & ^ a}. It is also easier to integrate with the existing syntax of Haskell 
because it does not require any changes to the syntax of predicates. 

By including dependency information, programmers can specify multiple pa- 
rameter classes more precisely. To illustrate this, consider the following examples: 

class Cab where . . . 

class D a b \ a b where . . . 

class E a b \ a b, b a where . . . 

From the first declaration, we can tell only that C is a binary relation. The 
dependency o ^ & in the second declaration tells us that D is not just a rela- 
tion, but actually a (partial) function. From the two dependencies in the last 
declaration, we can see that E represents a (partial) one-one mapping. 

The compiler is responsible for ensuring that the instances in scope at any 
given point are consistent with any declared dependencies^. For example, the fol- 
lowing declarations cannot appear together because they violate the dependency 
for D, even though either one on its own would be acceptable: 

instance D Bool Int where . . . 
instance D Bool Char where . . . 

Note also that the following declaration is not allowed, even by itself: 

instance D [o] b where . . . 

The problem here is that this instance would allow one particular choice of [a] 
to be associated with more than one choice for b, which contradicts the depen- 
dency specified in the definition of D. More generally, this means that, in any 

^ For practical reasons, a slightly different syntax is nsed for dependencies in the 
cnrrent prototype implementation, details of which are inclnded in the distribution. 
^ Superclass declarations are handled in a similar way, leaving the compiler to ensure 
that every instance of a given class is also an instance of any superclasses. 
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declaration of the form instance . . . ^ D t s where . . for some particular 
types t and s, the only variables that can appear in s are the ones that appear 
in t, and hence, if the type t is known, then s will be uniquely determined. 

4 Further Examples 

This section presents two additional examples to show how the use of functional 
dependencies can allow us to give more accurate specifications and to make more 
practical use of multiple parameter type classes. 

Arithmetic Operations The Haskell prelude treats arithmetic functions like 
addition (+) and multiplication (*) as functions of type Num a ^ a ^ a ^ a, 
which means that the result will always be of the same type as the arguments. 
A more flexible approach would allow different argument types so that we could 
add two Int values to get an Int result, or add an Int to a Float to get a Float 
result. This more flexible approach can be coded as follows: 

class Add a b c \ {a, b) c where (+) :: a ^ b ^ c 
class Mul a b c \ {a, b) c where (*):: a c 

instance Mul Int Int Int where . . . 
instance Mul Int Float Float where . . . 
instance Mul Float Int Float where . . . 
instance Mul Float Float Float where . . . 

In a separate linear algebra package, we might further extend our classes with 
arithmetic operations on vectors and matrices: 

instance Mul a b c ^ Mul a ( Vec b) ( Vec c) where . . . 
instance Mul a b c ^ Mul a {Mat b) {Mat c) where . . . 
instance {Mul a b c, Add c c d) 

Mul {Mat a) {Mat b) {Mat d) where . . . 

Without dependency information, we quickly run into problems with ambiguity. 
For example, even simple expressions like (1 * 2) * 3 have ambiguous types: 

(1 * 2) * 3 :: {Mul Int Int a, Mul a Int b) b. 

Using the dependencies, however, we can determine that a = Int, and then that 
b = Int, and so deduce that the expression has type Int. This example shows 
that it can be useful to allow multiple types on the left hand side of a dependency. 

Finite Maps A finite map is an indexed collection of elements that provides 
operations to lookup the value associated with a particular index, or to add a 
new binding. This can be described by a class: 

class FiniteMap i e fm \ fm ^ {i, e) where 
empty FM :: fm 

lookup :: z — >■ fm — >■ Maybe e 
extend :: z — >■ e — >■ fm — >■ fm 
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Here, fm is the finite map type, which uniquely determines both the index type 
i and the element type e. Association lists, functions, and arrays all fit naturally 
into this framework. We can also use a bit set as an indexed collection of booleans: 

instance {Eq i) FiniteMap i e [(z, e)] where . . . 

instance {Eq i) FiniteMap i e (z — >■ e) where . . . 

instance {Ix i) FiniteMap i e {Array i e) where . . . 
instance FiniteMap Int Bool BitSet where . . . 

This is a variation on the treatment of collection types in Section 3, and, if the 
dependency is omitted, then we quickly run into very similar kinds of problem. 
We have included this example here to show that it can be useful to allow 
multiple types on the right hand side of a dependency. 



5 Relations and Functional Dependencies 

In this section, we provide a brief primer on the theory of relations and fun- 
ctional dependencies, as well as a summary of our notation. These ideas were 
originally developed as a foundation for relational database design [1] . They are 
well-established, and more detailed presentations of the theory, and of useful 
algorithms for working with them in practical settings, can be found in standard 
textbooks on the theory of databases [8]. A novelty of the current paper is in 
applying them to the design of a type system. 



5.1 Relations 

Following standard terminology, a relation R over an indexed family of sets 
{A} i^i is just a set of tuples, each of which is an indexed family of values 
{ti}i^i such that ti G Di for each i € I. More formally, R is just a subset of 
m G I -Di, where a tuple t G {II i G I -Di) is a function that maps each index 
value z G / to a value U G Di called the zth component of t. In the special case 
where I = {!,..., n}, this reduces to the familiar special case where tuples are 
values {h,. . . ,tn) G DiX . . .x £)„. If A C /, then we write tx, pronounced “t at 
A”, for the restriction of a tuple t to X . Intuitively, tx just picks out the values 
of t for the indices appearing in A, and discards any remaining components. 



5.2 Functional Dependencies 

In the context of an index set I, a functional dependency is a term of the form 
X Y , read as “A determines Y,” where A and Y are both subsets of I. If a 
relation satisfies a functional dependency A F, then the values of any tuple 
at Y are uniquely determined by the values of that tuple at A. For example, 
taking I = {1,2}, relations satisfying {{1}'^{2}} are just partial functions 
from Di to D 2 , while relations satisfying {{1}'^ {2}, {2}'^ {1}} are partial, 
injective functions from Di to D 2 - 
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If f is a set of functional dependencies, and J C / is a set of indices, then 
the closure of J with respect to F, written Jp is the smallest set such that 
J C Jp, and that, if {X ^ Y) G F, and X C Jp , then Y C Jp . For example, if 
I = {1,2}, and F = {{1} {2}}, then |1}J = I, and {2}J = {2}. Intuitively, 

the closure Jp is the set of indices that are uniquely determined, either directly 
or indirectly, by the indices in J and the dependencies in F . Closures like this 
are easy to compute using a simple fixed point iteration. 



6 Typing with Functional Dependencies 

This section explains how to extend an implementation of Haskell to deal with 
functional dependencies. In fact the tools that we need are obtained as a special 
case of improvement for qualified types [5]. We will describe this briefly here; 
space restrictions prevent a more detailed overview. To simplify the presentation, 
we will assume that there is a set of indices (i.e., parameter names), written Ic, 
and a corresponding set of functional dependencies, written Fc, for each class 
name C . We will also assume that all predicates are written in the form C t, 
where t is a tuple of types indexed by Ic- This allows us to abstract away from 
the order in which the components are written in a particular implementation. 

The type system of Haskell can be described using judgements of the form 
P \ A\- E \ T . Each such judgement represents an assertion that an expression E 
can be assigned a type r, using the assumptions in A to type any free variables, 
and providing that the predicates in P are satisfied. When we say that a set of 
predicates is satisfied, we mean that they are all implied by the class and instance 
declarations that are in scope at the corresponding point in the program. For a 
given A and E, the goal of type inference is to And the most general choices for 
P and r such that P \ A\- E :t. If successful, we can infer a principal type for 
E by forming the qualified type P r — without looking at the predicates in 

P — and then quantifying over all variables that appear in P r but not in A. 

One of the main results of the theory of improvement is that we can apply 
improving substitutions to the predicate set P at any point during type inference 
(and as often as we like), without compromising on a useful notion of principal 
types. Intuitively, an improving substitution is just a substitution that can be 
applied to a particular set of predicates without changing its satisfiability pro- 
perties. To make this more precise, we will write [PJ for the set of satisflable 
instances of P, which is defined by: 

[Pj = {SP I 5 is a substitution and the predicates in SP are satisfied}. 

In this setting, we say that S is an improving substition for P if [PJ = [PPJ, 
and if the only variables involved in S that do not also appear in P are ‘new’ or 
‘fresh’ type variables. From a practical perspective, this simply means that the 
subsitution will not change the set of environments or the set of types at which 
a given value can be used. The restriction to new variables is necessary to avoid 
conflicts with other type variables that might already be in use. 
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Improvement cannot play a useful role in a standard Haskell type system: 
The language does not restrict the choice of instances for any given type class, 
and hence the only improving substitions that we can obtain are equivalent to an 
identity substitution. With the introduction of functional dependencies, however, 
we do restrict the set of instances that can be defined, and this leads to oppor- 
tunities for improvment. For example, by prohibiting the definition of instances 
of the form Collects a [&] where a h, we know that we can use an improving 
substitution [a/h] and map any such predicate into the form Collects a [o]. 

6.1 Ensuring that Dependencies are Valid 

Our first task is to ensure that all declared instances for a class C are consistent 
with the functional dependencies in Fq- For example, suppose that we have an 
instance declaration for C of the form: 

instance . . . O t where . . . 

Now, for each {X Y) G Fc, we must ensure that TV {ty) Q TV{tx) or 
otherwise the elements of ty might not be uniquely determined by the elements 
of tx- (The notation TV{X) refers to the set of type variables appearing free in 
the object V.) A further restriction is needed to ensure pairwise compatibility 
between instance declarations for C . For example, if we have a second instance: 

instance . . . C s where . . . , 

and a dependency (A F) G Fq, then we must ensure that ty = sy whenever 
tx = sx- In fact, on the assumption that the two instances will normally contain 
type variables — which could later be instantiated to more specific types — we 
will actually need to check that: for all (kind-preserving) substitutions 5, if 
Stx = Ssx, then Sty = Ssy. It is easy to see that this test can be reduced to 
checking that, if tx and sx have a most general unifier U , then Uty = Usy . This 
is enough to guarantee that the declared dependencies are satisfied. For example, 
the instance declarations in Section 3 are consistent with the dependency ce e. 

6.2 Improving Inferred Types 

There are two ways that a dependency (A ^ F) G for a class C can be used 
to help infer more accurate types: 

— If we have predicates {C t) and {C s) with tx = sx, then ty and sy must 
be equal. 

— Suppose that we have an inferred predicate C t, and an instance: 

instance . . . C t' where . . . 

If tx = St'x, for some substitution S (which could be calculated by one-way 
matching), then ty and St'y must be equal. 

In both cases, we can use unification to ensure that the equalities are satisfied, 
and to calculate a suitable improving substitution [5]. If unification fails, then 
we have detected a type error. Note that we will, in general, need to iterate this 
process until no further opportunities for improvement can be found. 
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6.3 Detecting Ambiguity 

As mentioned in Section 3, we cannot guarantee a well-defined semantics for any 
function that has an ambiguous type. With the standard definition, a type of 

the form (Vai Va„.P r) is ambiguous if ({«i, . . . , a„}n TV (P)) % TV{t), 

indicating that one of the quantified variables Ui appears in TV (P) but not in 
TV (t). Our intuition is that, if there is no reference to Ui in the body of the type, 
then there will be no way to determine how it should be bound when the type is 
instantiated. However, in the presence of functional dependencies, there might 
be another way to find the required instantiation of o^. We need not insist that 
every a G TV (P) is mentioned explicitly in r, so long as they are all uniquely 
determined by the variables in TV{t). 

The first step to formalizing this idea is to note that every set of predicates 
P induces a set of functional dependencies Fp on the type variables in TV{P): 

Fp = { TV(tx) ^ TVity) I (C t) G P, (A ^ y) G Fc }. 

This has a fairly straightforward reading: if all of the variables in tx are known, 
and if A Y , then the components of t at A are also known, and hence so are 
the components, and thus the type variables, in t at Y . 

To determine if a type (Vai. . . . Va„.P r) is ambiguous, we calculate the 
set of dependencies Fp, and then take the closure of TV (t) with respect to Pp to 
obtain the set of variables that are determined by t. The type is ambiguous only 
if there are variables Ui in P that are not included in this closure. More concisely, 
the type is ambiguous if, and only if ({ai, . . . , a„} fl TV{P)) 2 {TV{t))~^^. 

On a related point, we note that current implementations of Haskell are requi- 
red to check that, in any declaration of the form instance P ^ C t where . . ., 
only the variables appearing in t can be used in P (i.e., we must ensure that 
TV (P) C TV{t)). In light of the observations that have been made in this sec- 
tion, we can relax this to require only that TV{P) C (TV Thus P may 
contain variables that are not explicitly mentioned in t, provided that they are 
still determined by the variables in t. 



6.4 Generalizing Inferred Types 

In a standard Hindley-Milner type system, principal types are computed using 
a process of generalization. Given an inferred but unquantified type P t, we 
would normally just calculate the set of type variables T = TV{P r), over 
which we might want to quantify, and the set of variables V = TV (A) that are 
fixed in the current assumptions A, and then quantify over any variables in the 
difference, T \ H. In the presence of functional dependencies, however, we must 
be a little more careful: a variable a that appears in T but not in V may still 
need to be treated as a fixed variable if it is determined by H. To account for 
this, we should only quantify over the variables in T \ Vp^. 
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7 Putting a Name to Functional Dependencies 

The approach described in this paper provides a way for programmers to indicate 
that there are dependencies between the parameters of a type class, but stops 
short of giving those dependencies a name. To illustrate this point, consider the 
following pair of class declarations: 

class U a b \ a b where . . . 
class U a b V a b where . . . 

From the first declaration, we know that there is a dependency between the 
parameters of U ; should there not also be a dependency between the parameters 
of V, inherited from its superclass U1 Such a dependency could be added by 
changing the second declaration to: 

class Uab^Vab\a'^b where . . . 

but this tells only part of the story. For example, given two predicates U a b 
and Vac, nothing in the rules from Section 6 will allow us to infer that b = c. 
Let us return to the dependency on U and give a name to it by writing u for 
the function that maps each a to the b that it determines. This might even be 
made explicit in the syntax of the language by changing the declaration to read: 

class U a b \ u :: a b where . . . 

Now we can change the declaration of V again to indicate that it inherits the 
same dependency u: 

class U a b V a b \ u :: a b where . . . 

Now, given the predicates U a b and F a c, we can infer that b = u a = c, 
as expected. It is not yet clear how useful this particular feature might be, or 
whether it might be better to leave the type checker to infer inherited depen- 
dencies automatically, without requiring the programmer to provide names for 
them. The current prototype includes an experimental implementation of this 
idea (without making dependency names explicit), but the interactions with 
other language features, particularly overlapping instances, are not yet fully un- 
derstood. Careful exploration of these issues is therefore a topic for future work. 
However, the example does show that there are further opportunities to exploit 
dependency information that go beyond the ideas described in Section 6. 

8 Conclusions and Future Work 

The ideas described in this paper have been implemented in the latest version 
of the Hugs interpreter [7], and seem to work well in practice. Pleasingly, some 
early users have already found new applications for this extension in their own 
work, allowing them to overcome problems that they had previously been unable 
to fix. Others have provided feedback that enabled us to discover places where 
further use of dependency information might be used, as described in Section 7. 
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In constructing this system, we have used ideas from the theory of relational 
databases. One further interesting area for future work would be to see if other 
ideas developed there could also be exploited in the design of programming lan- 
guage type systems. Users of functional languages are, of course, accustomed to 
working with parameterized datatypes. Functional dependencies provide a way 
to express similar relationships between types, without being quite so specific. 
For example, perhaps similar ideas could be used in conjunction with existential 
types to capture dependencies between types whose identities have been hidden? 
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Abstract. We introduce graph reduction technology that implements 
functional languages with control, such as Scheme with call/cc, where 
continuations can be manipulated explicitly as values, and can be opti- 
mally reduced in the sense of Levy. The technology is founded on pro- 
ofnets for multiplicative-exponential linear logic, extending the techni- 
ques originally proposed by Lamping, where we adapt the continuation- 
passing style transformation to yield a new understanding of sharable 
values. Confluence is maintained by returning multiple answers to a (sha- 
red) continuation. 

Proofnets provide a concurrent version of linear logic proofs, eliminating 
structurally irrelevant sequentialization, and ignoring asymmetric distin- 
ctions between inputs and outputs — dually, expressions and continuati- 
ons. While Lamping’s graphs and their variants encode an embedding of 
intuitionistic logic into linear logic, our construction implicitly contains 
an embedding of classical logic into linear logic. 

We propose a family of translations, produced uniformly by beginning 
with a continuation-passing style semantics for the languages, employing 
standard codings into proofnets using call-by- value, call-by-name — or hy- 
brids of the two — to locate proofnet boxes, and converting the proofnets 
to direct style. The resulting graphs can be reduced simply (cut elimi- 
nation for linear logic), have a consistent semantics that is preserved 
by reduction (geometry of interaction, via the so-called context seman- 
tics), and allow shared, incremental evaluation of continuations (optimal 
reduction) . 



1 Introduction 

Expressions and continuations are dual, separate but equal computational struc- 
tures in a programming language. The former provides a value; the latter consu- 
mes it. Yet evaluating expressions is very familiar, while evaluating continuations 
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is considered esoteric, even though both are made of the same stuff. The incor- 
poration of continuations as first-class citizens in programming languages was 
not welcomed like the Emancipation Proclamation, but instead regarded warily 
as a kind of witchcraft, with implementation pragmatics that are ill-defined and 
unclear. If expressions and continuations are indeed dual, then so should be the 
technology of their implementation, and the flexibility with which we reason ab- 
out them. Efficient evaluation of one should reveal dual strategies for evaluating 
the other. In short, everything we know about expressions we ought to know 
about continuations. 

We take a significant step towards this equality by formulating a general 
version of graph reduction that implements the sharing and optimal incremental 
evaluation of both expressions and continuations, each evaluated using the same 
primitive operations. By founding our technology on generic tools from logic and 
programming language theory, specifically the CPS transform and its relation 
to linear logic, we are for the first time able to produce a family of related 
implementations in an entirely mechanical way. 

Nishizaki earlier produced a coding of Scheme with call/ cc in linear logic, 
via ad hoc reasoning, based on a proof of a proposition of linear logic corre- 
sponding to the type of call/cc [22]. In contrast, our new contribution is to 
produce Nishizaki’s coding, and many others, by a mechanical process based on 
the denotational semantics of the programming language. Not only do we get a 
much deeper insight into principles, we greatly simplify the problem of construc- 
ting graph reduction implementations of other languages with explicit control. 
In bringing ideas from logical theory closer to implementation technology, we 
hope to make researchers think about the pragmatics of continuations in simple, 
novel, and useful ways. 

Our methodology is founded on proofnets for multiplicative-exponential li- 
near logic, following the beautiful insights of John Lamping [17], who realized 
Jean- Jacques Levy’s specification of correct, optimal reduction for the A-calculus 
[18], and of Gonthier, Abadi, and Levy, who reinterpreted Lamping’s insights in 
the guise of Girard’s geometry of interaction, and the related embedding of intui- 
tionistic logic in linear logic [12,13]. Linear logic [11] provides an ideal substrate 
for the implementation of control operators, as it makes no asymmetric distinc- 
tions between inputs and outputs, or analogous expressions and continuations. 
We extend the optimal reduction technology to implement explicit control and 
sharing of continuations, essentially via an embedding of classical logic in li- 
near logic, following a line of research beginning with Griffin and then Murthy 
[14,21]. Our construction is based on continuation-passing style, but generates 
direct-style graphs. This approach extends to implement most any functional 
language with abortive control operators whose semantics can be described in 
continuation-passing style — for example, Filinski’s symmetric X-calculus [8], and 
Parigot’s \f_i-calculus [23].^ 



^ We have implemented these languages using the techniques in this paper. This work 
will be included in a later, extended version of the manuscript. 
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What do optimality and correctness mean in a language with explicit control? 
To understand optimality and sharing in the context of continuations, consider 
the evaluation of a Scheme expression 

([fum] 

(call/cc (lambda (a) ([fuu 2 ] 

(call/cc (lambda (b) [exp])))))) 

in the context of some complex continuation k. The expression [exp] can impli- 
citly access its current continuation c, or explicitly access continuations a and 
b named by a and b. All of these continuations extend the continuation of the 
entire expression k. Optimality ensures that a, b, and c share k, that b and c 
share a, and so on. If continuations are shared, duplicate work can be avoided 
as continuations are simplified. 

A reduction strategy p is correct if for any expression E, if there is some stra- 
tegy cr that reduces if to a normal form, then p also reduces if to a normal form. 
In the absence of control operators, normal forms in the A-calculus are unique, so 
all correct strategies produce the same normal form, if one exists. However, con- 
trol operators destroy the uniqueness of normal forms: let E be the Scheme ex- 
pression (call/cc (lambda (k) ((lambda (x) 1) (k 2)))); a call-by-name 
strategy reduces if to 1, while a call-by- value strategy reduces if to 2. A correct 
evaluation strategy cannot simply choose one of these answers. Define context C 
as (if (= [— ] 1) 0 T) and C as (if (= [— ] 1) TO). Since C[E] evaluates 
to 0 under call-by-name and diverges under call-by-value, a correct evaluation of 
E must return 1. But since C"[if] evaluates to 0 under call- by- value and diverges 
under call-by-name, a correct evaluation of E must return 2. Returning both 1 
and 2 in the evaluation of E is not contradictory: it merely amounts to supplying 
both answers to a single shared continuation. 

Technical contributions: The efficiency of optimal reduction is based on the 
incremental propagation of sharing nodes. Implementations of optimal reduction 
based on linear logic, as proposed by Gonthier, Abadi, and Levy, and later by 
Asperti, use proofnet boxes to coordinate these interactions. Nevertheless, the 
boxing strategy only permits the sharing of values. To extend this technology to 
languages with control operators, the key technical question is: where do we put 
the boxes to allow the sharing of continuations? 

Our solution exploits the continuation-passing style (CPS) transformation. 
If a language with control operators can be translated into the pure A-calculus 
using a CPS transformation, we can use existing technology to construct the 
graph of the CPS transformed term. The CPS translation of a term is more ver- 
bose than the original term, and more expensive to reduce to a normal form. We 
show that for a language with abortive continuations, the graphs of CPS terms 
can be mechanically converted back to direct style, maintaining the boxing of 
continuations induced by the CPS term. The transformation “rotates” principal 
ports of boxes so that continuations can be copied. We prove that this trans- 
formation does not change the underlying denotational semantics of the terms, 
as defined by the geometry of interaction. This approach can be applied to any 
variant of the CPS transformation, and any strategy for coding pure A-terms as 
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proofnets. The result is a family of possible translations into graphs, and these 
graphs can be optimally reduced in the sense of Levy’s labelled terms. 

Traditionally, compiler optimizations have addressed sharing of expressions. 
The technology presented here provides a new systematic basis on which to 
optimize the sharing of continuations. 

In summary, all of the translations we outline possess a simple graph reduc- 
tion on translated terms (cut elimination for linear logic), a consistent semantics 
that is preserved by reduction (geometry of interaction, via the so-called con- 
text semantics of Gonthier [12]), and a mechanism whereby continuations can 
be incrementally evaluated (optimal reduction) . The situation of this technology 
within multiplicative-exponential linear logic ensures that the semantic charac- 
terization given is equivalent to the operational semantics of graph reduction. 
Viewing data types as games, and contexts (in the sense of Gonthier) as moves 
in a composite game, one immediately suspects that categories of games should 
provide the right kind of “more abstract” semantics for calculi with explicit con- 
trol. Furthermore, full abstraction theorems for languages with control seem to 
be easily accessible, given the full completeness results for linear logic. 



2 Preliminaries 

We briefly sketch the construction of graphs to implement A-calculus; more de- 
tails can be found elsewhere [1,12]. Graphs are composed of wires and fixed-arity 
nodes, as well as boxes, which enclose subgraphs. The A-calculus is encoded using 
apply nodes (@), lambda nodes (A), sharing nodes (v)? weakening nodes (©), 
and croissants (^). A box allows a subgraph to be duplicated by a sharing 
node, or discarded by a weakening node. When sharing is no longer required, a 
croissant can open the box, allowing interaction with the subgraph inside. The 
meaning of the other nodes should be intuitive. 

One port of each node or box is designated as the principal port. Other 
ports are auxiliary ports. Reduction takes place when two graph constructs are 
connected at their principal ports. A box can also interact with another box at 
its auxiliary port. Global reduction rules are shown in Figure 1, where black dots 
indicate principal ports. Graphs can also be reduced by local reduction rules, 
described elsewhere [1,12]. Local reduction of the graph of a A-term implements 
Levy’s optimal reduction [18]. 
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Fig. 1. Global reduction rules 
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The big question in implementing A-calculus within this framework is where 
to put the boxes to allow the unrestricted sharing of values. We mention two 
commonly used boxing disciplines; see [19] for others. The call-by-value (CBV) 
coding boxes the graph of every A-abstraction. Correspondingly, a croissant is 
placed on the function position of every apply node. The call-hy-name (CBN) 
coding boxes the graph of the argument of every function application. Corre- 
spondingly, a variable reference is implemented by a croissant. These codings 
amount to Curry-Howard style embeddings of intuitionistic logic in linear logic. 
Figure 2 illustrates the CBN coding of the A-calculus, which we will use in this 
paper. Note that the left side of a lambda node leads to the graph of the body, 
while the right side leads to the (perhaps shared) occurrence of the bound varia- 
ble. Correspondingly, the left side of an apply node leads to the context of the 
application, while the right side leads to the argument. Our results are equally 
applicable to the CBV coding, and to any other consistent boxing strategy. 




Qr^M Qn{\x.Ml Gr.[MN\ 

Fig. 2. CBN coding of A-terms 



Graphs of simply-typed A-terms can be assigned linear-logic types. In par- 
ticular, the type of a box is lA, allowing sharing of the A-typed value inside. 
Regardless of the boxing strategy, the constraints of linear logic typing imply: 



Proposition 1. Boxes never get in the way of j3 reductions. 



As a consequence, optimal (local) reduction reduces any two graphs with the 
same arrangement of apply, lambda, and sharing nodes in the same way. 

Asperti has proposed some optimizations to these boxing strategies [1]. The 
simplest is to apply the following rule to the translation of a A-term: 









We apply this optimization, without comment, throughout the paper. 
Implementing control operators: The above codings make the continuation 
and argument of an application equally accessible, the former on the left side, 
and the latter on the right side of the apply node. References to these values 
are similiarly equally accessible to a A-abstraction, at the left and right side 
of the lambda node. Because the A-calculus can only express the sharing of 
arguments, via parameter binding, the boxing strategies only ensure that the 
value of the argument is boxed. Control operators such as Scheme’s call/cc. 
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however, introduce the possibility to name, and thus duplicate and discard, the 
continuation. 

3 Continuations in the A-Calculus 

We now derive a family of graph encodings for terms in the A-calculus with 
call/cc from encodings of the corresponding CPS terms. For pure A-terms, this 
approach produces graphs with the same arrangement of lambda, apply, and 
sharing nodes as previous translations, and thus such graphs reduce to normal 
form in the optimal number of beta steps, as given by Levy’s specification. 

3.1 The CPS Transformation 

Continuation-passing style (CPS) is a style of programming in which the conti- 
nuation at each point is represented explicitly as a function. Because the conti- 
nuation function makes explicit the remaining computation at the current pro- 
gram point, a CPS term necessarily specifies an evaluation order. A A-calculus 
program can be converted to CPS automatically using a CPS transformation. 
Furthermore, the control operator call/cc can be translated into CPS. Typi- 
cally a CPS transformation encodes a CBV or CBN evaluation order, however 
any consistent mixture is possible [15]. 

Plotkin’s CBV and CBN CPS transformations, extended with the translation 
of call/cc, are shown in Figures 3 and 4, respectively [24]. For typed terms, 
these transformations induce a corresponding transformation on types. Define 
a* = a for any base type a (including _L) and (a — >■ /?)* = a* — >■ -i-i/d* where 
-■r = T — >■ _L; then the CBV CPS transformation maps a derivation {x : cr G 
r} \- E ■. T io {x ■. (7* \ X : a & r} \- C^|A] : Similarly, define a} = a, and 

(a —>■ — >■ then the CBN CPS transformation maps the same 

derivation to {x : -i-ktI | x : a G E} \- : -i-irb 



Cv\x\ = \k.kx 

C4Aa;.M] = AK.K(Ax.Afc.C4M]fc) 

C4MA] = XK.C^MjiXv.C^NjiXw.vwK)) 
C„|call/cc] = XK.K{Xf.Xk.f{Xv.Xc.kv)k) 



Fig. 3. CBV CPS transformation of the A-calculus, including call/cc 



Replacing the A-terms produced by a CPS transformation by the correspon- 
ding graphs gives a translation of terms into graphs in which the continuation is 
accessible as a sharable value. Figure 5 presents the graph translation correspon- 
ding to the CBV CPS transformations. We have used the CBN boxing strategy, 
although any strategy can be used. The translation corresponding to the CBN 
CPS transformation is similar. Only _L types are indicated. 
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Cn\x\ = \k.XK 

CnlXx.Mj = XK.K{Xx.Xk.C„lMjk) 

C„IMN] = XK.C„lMj{Xv.v{C„lNj)K) 

C„|call/cc] = XK.K,{Xf.Xk.f{Xa.a{Xq.q{Xv.Xc.vk))k)) 



Fig. 4. CBN CPS transformation of the A-calculus, including call/cc 



\k 






Fig. 5. Graph translation based on the CBV CPS transformation, and the CBN boxing 
strategy 



This implementation strategy, while straightforward, is unsatisfactory. The 
CPS transformation introduces lambda and apply nodes that are not part of the 
original term. Thus, optimal reduction of the resulting graph does not reduce 
the original term using the minimal number of /3 steps. Indeed, the number of 
(3 steps is affected by the CPS transformation chosen. Furthermore, the graph 
translation does not exploit the symmetry between the left side of a lambda or 
apply node, which connects to the continuation of a function application, and the 
right side, which connects to the argument. The CPS encoding does, however, 
produce a graph in which continuations are consistently boxed. Thus, we would 
like a graph translation generating the same arrangement of lambda and apply 
nodes for pure A-terms as the translations defined in Section 2, while retaining 
the boxing of continuations suggested by the CPS transformation. 

3.2 The DS Transformation on Graphs 

Essentially, we would like to eliminate the lambda and apply nodes that construct 
and manipulate continuations. To simplify the graph, we exploit the T return 
type of every continuation and continuation abstraction. In a CPS program, we 
are not interested in the result of type T, but instead in the value passed to the 
initial continuation. We can show that the computation of a value of a non-T type 
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cannot depend on an edge transmitting a value of _L type. The simple conclusion 
is to remove all such edges. Because this transformation eliminates continuations 
from CPS graphs, we refer to it as the direct-style (DS) transformation. 

Removing edges from the graph of course affects the nodes incident upon 
these edges. Figure 6 shows transformation rules sufficient to treat CPS terms. 
_L-typed edges in other positions are treated similarly. 



I I y. / 

A ^ I @ ^ 1^ 

Fig. 6. DS graph transformation rules 



While the graph codings of Section 2 were inspired by embeddings of intui- 
tionistic logic into linear logic, they can equally well implement untyped terms. 
Here, however, we do require that _L-typed values are used consistently. 

The results of applying the DS transformation to the graph translations 
based on the CBV and CBN CPS transformations are shown in Figures 7 and 
8, respectively. We refer to these translations as the CBVcps/n £^nd CBNcpg/N 
translations, respectively. Both achieve our goals: The arrangement of apply 
and lambda nodes in the translation of a pure A-term is identical to that of the 
codings presented in Section 2, and continuations are boxed, allowing them to 
be duplicated or discarded. 



t 

X 



St)^ fxj 











ei^JAx.M] 



gT>. IMNj Gt,, [call/cc] 



Fig. 7. DS graph translation derived from the CBV CPS transformation 



3.3 Correctness of the DS Transformation 

A graph can be viewed as a set of apply and lambda nodes connected by edges 
that may contain sharing information, as controlled by sharing nodes, croissants, 
and box boundaries. Because the DS transformation only modifies (i.e., elimina- 
tes) apply and lambda nodes, it does not directly affect this sharing information. 
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Fig. 8. DS graph translation derived from the CBN CPS transformation 

Consider the path between two edges that are ultimately connected by reduc- 
tion. We prove that the DS transformation maintains the way in which such a 
path traverses apply and lambda nodes. Because the arrangement of the other 
nodes is not modified by the DS transformation, they continue to behave in the 
same way as well. 

Throughout, we assume the graph is simply typed. 

Definition 1. A path is a directed sequence of connected edges, labelled as 
shown below, where for each node traversed on the path, the two edges inci- 
dent on the node connect respectively to the principal port, and to an auxiliary 
port. The label of a path is the concatenation of the labels of the edges. 




Definition 2. A well-balanced path is a path whose label is described by the 
following grammar, where 1 is the empty word: 

B:~l\ (lB)l I {rB)r I [lB]l \ [rB]r \ BB 



An unbalanced path is a path that is not well-balanced and whose label is a 
subword of a label derivable from B. 

This definition of well-balanced path generalizes that of Asperti and Laneve 
[2] to include paths that cannot occur in the translation of an ordinary A-term, 
but can occur in the image of the DS transformation. Note that some paths are 
neither well-balanced nor unbalanced, for example a path that enters an apply 
node on the left and immediately exits the next lambda node on the right, with 
label {l]r- Unbalanced paths describe correct information flow (in the sense of 
the geometry of interaction) in a well-typed graph, but cannot reduce to form a 
beta redex. 
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Proposition 2. Let p he a path that is converted to an edge by the DS trans- 
formation. Then, p is either well-balanced or unbalanced. 

We can show the following by induction on the structure of a path in a 
well-typed graph: 

Proposition 3. In a well-balanced path, the first and last edges have identical 
types. 

An unbalanced path has a label of the form B)X or X{B, where ) and ( are 
one of the four forms of open/closed parentheses, and A is a well-balanced or 
unbalanced path. Applying Proposition 3 to B, we can show: 

Proposition 4. In an unbalanced path, either the first or last edge has arrow 
type. 

For a path p, if T>{p) is an edge, every node of p must be eliminated by the 
DS transformation. Thus, using Proposition 4, we can show: 

Proposition 5. Letp he an unbalanced path. IfV{p) is an edge, then either the 
first or last edge of p has type A — >■ _L, for some A. 



Theorem 1. (Soundness) Let G be the graph of a pure X-term. Then the 
diagram 

G ^G' 

I® I® 

P(G) ^ V{G') = G" 

commutes, where: 

1. (The top path can he simulated by the bottom path): If G reduces to G' by 
global reductions, then TX{G) reduces to T>{G') by global reductions. 

2. (The bottom path can he simulated by the top path): IfT>{G) reduces to G" by 
global reductions, then there is some G' such that G reduces to G' by global 
reductions, and G” = T>{G'). 

Proof. To prove that the top path can be simulated by the bottom path, we 
use induction on the number of reduction steps in the top path. Observe that 
the effect of the DS transformation on a node is completely local, determined 
only by the types of edges incident on the node. Thus, the effect of the DS 
transformation on source nodes not affected by the reduction step is the same in 
both the source graph and the reduced graph; we need only consider the effect 
of the DS transformation on the nodes involved in the reduction step. Consider 
each possible redex in the source graph: 

— A /3-redex with function type A ^ B, where i? yf T. Since the redex is not 
affected by the DS transformation, we can perform the same reduction step 
in T>{G), and the resulting graph has the form V{G'). 
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— A /3-redex with function type A — >■ _L. Reducing a beta redex creates an 
edge of the argument type (connecting the argument to the occurrence of 
the parameter) and an edge of the return type (connecting the result of 
the body to the context of the application). Thus, reducing a redex with 
function type A — >• _L creates an edge of type A and an edge of type _L in G". 
The edge of type T is then eliminated by the DS transformation. Applying 
the DS transformation to G directly also eliminates the T-typed edges and 
connects the A-typed edges, thus having the same effect as beta reduction. 

— Box duplication, absorption, croissant-box interaction. These operations each 
involve an edge of type !A, which is unaffected by the DS transformation. 
Thus, the identical operation can be performed in T>{G). 

To show that reductions in the bottom path can be simulated by reductions 
in the top path, we proceed by induction on the number of reduction steps from 
V{G) to G" . Since the DS transformation only converts A- and apply nodes to 
edges, an edge e between two interaction ports in T>{G) corresponds to a path p 
in G consisting of a sequence of lambda and apply nodes, such that V{p) = e. If 
p is not an edge, we show that p must /3-reduce to one; thus, a single reduction 
step in V{G) is simulated by a sequence of /3-steps in G, followed by the same 
reduction step as performed in T>{G). Because p consists of only A- and apply 
nodes, it suffices to show that p is a balanced path. Consider each possible redex 
in V{G): 

— A /3-redex. If p were unbalanced, by Proposition 5, either the first or last 
edge would have type A — >■ T, for some A. In that case, the lambda or apply 
node connected to that edge would be eliminated by the DS transformation, 
contradicting the fact that p connects nodes that form a beta redex in T>{G). 
Thus, p must be well-balanced. 

— Box duplication, box absorption, croissant-box interaction. In all of these 
cases, the type of the edge between the interaction ports is !A. Because the 
boxes, croissants, and sharing nodes are not affected by the DS transforma- 
tion, the first and last edges of p must also have type !A. By Proposition 4, 
p cannot be unbalanced. Thus, by Proposition 2, p must be well-balanced. 



3.4 Embeddings of Classical Logic 

Each of our proofnet implementations implicitly encodes, via an extended Curry- 
Howard correspondence, an embedding of classical logic in multiplicative-expo- 
nential linear logic (MELL). This family of encodings results from the mix-and- 
match of standard double-negation embeddings of classical logic into intuitio- 
nistic logic, composed with embeddings of intuitionistic logic into MELL. We 
discuss these constructions for minimal implicational logic with -i-i-elimination. 

Let [a f}] =\[a] —o \(3] be the Girard translation of intuitionistic implica- 
tion in linear logic; similarly, let {a — >■ /3) =!((o;) — o (/3)) be the Gonthier-Abadi- 
Levy translation (see [12]). By standard linear logic identities, [~'“'r] =?![r] and 
(-■-ir) =!?(r), where ! and ? are the (dual) exponential modalities. 
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What does this mean in terms of graph reduction? When a subgraph G with 
root typed ?!a is substituted into a sharable context C, the ? marks a croissant 
at the root of G that breaks the box around C, which then shares the value of 
type \a. Dually, if G has type !?/?, the context has type ?!(/?)-*-, and the protocol 
for box opening and sharing reverses the role of context and value. 

Recall {a — >■ /3)* = a* — >• is the translation of a — >■ /? induced by the 

CBV CPS transformation, and {a — >■ /3)^ = — >■ is the translation 

induced by the CBN CPS transformation. The CPS translations of a function 
(classical proof) E ■. a ^ f3 result in a function of type — >■ /3)* or -i-i(a — >• 

PV-, we then have 

h-(a ^ d)1 = ?!(! [0*1 ^?! [d*]) (--(a ^ d)*) = !?!((«*) ^!?(d*)) 

d)t] = ?!(!?! [o^] ^?! [d^] ) (--(a ^ d)^) = !?!(!?(a^) ^!?(d^)) 

Other variants of this mix-and-match style are possible. Fewer modalities mean 
fewer boxes and greater implementation efficiency. 



4 Related Work 

We consider three areas of related work: other CPS transformations, other ap- 
proaches to converting CPS programs back to direct style, and other connections 
between control operators and linear logic. 

Optimizing the CPS transformation: The Plotkin CPS transformations create 
many “administrative” redexes involving the application of a continuation or 
continuation abstraction [24]. Our DS transformation converts administrative 
redexes into box-croissant redexes, adding a bureaucratic cost to optimal re- 
duction [1,20]. More optimized CPS transformations [7,25] could generate more 
efficient implementations. 

Converting CPS programs back to direct style: Danvy first investigated the 
problem of converting a CPS program back to direct style [5], later extended 
with Lawall. The conversions were only on terms that could be output by CPS 
transformation. Our DS transformation also relies on a uniform, but weaker 
property: values of type T must occur consistently, and do not contribute to the 
final result. At the extreme, our DS transformation is simply the identity on the 
graphs of DS terms. 

Relating languages with control operators to linear logic: Nishizaki also inve- 
stigated encodings of A-calculus plus call/cc in proofnets [22]. He showed that 
normalization of these proofnets is complete with respect to normalization in 
the term language. He began by adding modalities in an ad hoc manner (indu- 
ced mechanically by our CBVcps/n translation) to the type \A —o B, allowing 
sharing of both values and continuations. His more complex translation is an 
optimization of our CBVcps/n translation, eliminating some box croissant in- 
teractions corresponding to administrative redexes. Because Nishizaki derived a 
translation from the types rather than from the semantics, he had to prove that 
the resulting graphs model the semantics of the language. The correctness of 
our approach relies only on the correctness of the CPS transformation, and on 
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the correctness of the DS transformation on graphs, which is independent of the 
language being implemented. 

In a sequel to his earlier work on symmetric A-calculus, Filinski used linear 
logic as a tool for understanding continuations [9]. Some of the linear types he 
proposed for continuations appear in our codings, the most common being ?!a, 
resulting from the DS transformation of a graph with type (a — o_L) — o_L. Grif- 
fin, and later Murthy, showed the relation between so-called -i-i-embeddings of 
classical logic in intuitionistic logic, and the implementation of control opera- 
tors [14,21]. In particular, they showed how varieties of the CPS transformation 
provide the constructive content of such embeddings. We further translate such 
terms into proofnets in direct style, eliminating the administrative redexes. The 
result is a family of constructive embeddings of classical logic into linear logic. 

5 Future Work 

Since the continuation created by call/ cc is abortive, a term containing call/ cc 
can reduce to different normal forms; its CPS counterpart, like all pure A-terms, 
has only one normal form. Because the DS transformation produces boxes ac- 
cording to the CPS transformation providing its input, it should be possible to 
identify, among the shared normal forms it can return, the answer that would 
have been produced by the CPS-converted input. We leave further analyses of 
these observations to future work. 

Efficiently managing the reified continuation is a significant problem in imple- 
menting languages with control operators [4,16]. Proofnet implementations sug- 
gest the possibility of evaluating programs containing control operators using 
optimal reduction, with a minimal copying of shared values. Nevertheless, we 
are exchanging the savings of optimal reduction for the overhead of box mana- 
gement. Further experiments are needed to understand whether the exchange is 
cost-effective, and if it can be further optimized by better box technology. 

Our proofnet technology might be extended to languages with functional 
control operators, such as Danvy and Filinski’s shift and reset, and Sitaram 
and Felleisen’s control and prompt [6,27]. While shift and reset are defined 
in terms of a CPS transformation, continuations do not have return type _L; the 
DS transformation is then inapplicable. Both shift and reset, and control 
and prompt can be defined in terms of call/cc and a reference cell [10,27]. 
Bawden has shown how to implement reference cells using sharing graphs [3] , so 
this strategy may still lead to an interesting proofnet implementation. 

6 Conclusions 

We have shown how to implement various languages with explicit control using 
graph reduction, where the structure of the graphs are proofnets from linear 
logic. The principal technical difficulty in such codings is the location of boxes, 
which allow computations to be shared. Rather than specifying a fixed scheme 
for locating boxes, we have introduced a general methodology based on the CPS 
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transform. Different versions of CPS, followed by our DS transform on graphs, 
produce a wide range of consistent schemes for locating boxes in proofnets. The 
noble art of linear decorating, to repeat the phrase of Schellinx [26], has been 
replaced by a factory. 

The theoretical foundation of our implementation technology means that 
we have a consistent semantics provided by the geometry of interaction, and 
a means of incrementally evaluating continuations via optimal evaluation. The 
codings may clarify full abstraction theorems for languages with explicit control, 
given the full completeness results that are known for linear logic. But the ge- 
nuine progress reflected in the presented techniques is the technology transfer of 
logic and proofnets to the mundane algorithmics of implementation. The prag- 
matics of double negation in logic, for example, is just packaging: the boxing of 
sharable data so that they can interact with each other. Further implementation 
improvements amount to a better understanding of where to put boxes. A gene- 
ration of compiler writers has spent considerable effort optimizing the efficiency 
of sharing expressions. We have presented a systematic basis on which to opti- 
mize the sharing of continuations, providing new territory for similar efficiency 
improvements. 

Acknowledgments. We thank Alan Bawden and Olivier Danvy for commen- 
ting on a draft of this paper. 
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Abstract. We present a module calcnlns for studying a simple model of 
link-time compilation. The calculus is stratified into a term calculus, a 
core module calculus, and a linking calculus. At each level, we show that 
the calculus enjoys a computational soundness property: if two terms are 
equivalent in the calculus, then they have the same outcome in a small- 
step operational semantics. This implies that any module transformation 
justified by the calculus is meaning preserving. This result is interesting 
because recursive module bindings thwart confluence at two levels of our 
calculus, and prohibit application of the traditional technique for sho- 
wing computational soundness, which requires confluence. We introduce 
a new technique, based on properties we call lift and project, that uses a 
weaker notion of confluence with respect to evaluation to establish com- 
putational soundness for our module calculus. We also introduce the weak 
distributivity property for a transformation T operating on modules Di 
and D 2 linked by ©: T{D\ © D 2 ) = T{T{D\) © T{D 2 )). We argue that 
this property finds promising candidates for link-time optimizations. 



1 Introduction 

We present a module calculus for a purely functional language that is a tool for 
exploring the design space for a simple form of link-time compilation. Link-time 
compilation lies in the relatively unexplored expanse between whole-program 
compilation, in which the entire source program is compiled to an executable, 
and separate compilation, in which source program modules are independently 
compiled into fragments, which are later linked to form an executable. In the 
link-time compilation model (1) source program modules are first partially com- 
piled into intermediate language modules; (2) intermediate modules are further 
compiled when they are combined, taking advantage of usage information ex- 
posed by the combination; and (3) when all intermediate modules have been 
combined into a final closed module, it is translated into an executable. 

Link-time compilation can potentially provide more reusability than whole- 
program compilation and more efficiency than separate compilation. While se- 
parate compilation offers well-known benefits for program development and code 
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reuse, a drawback is that the compilation of one module cannot take advantage 
of usage information in the modules with which it is later linked. In contrast, 
link-time compilation can use this information to perform optimizations and 
choose specialized data representations more efficient than the usual uniform 
representations for data passed across module boundaries. 

In this paper we take some first steps towards formalizing link-time com- 
pilation. There are three main contributions of this work. First, we present a 
stratified untyped call-by-value module calculus that at every level satisfies a 
computational soundness property^: if two expressions can be shown equiva- 
lent via calculus steps, then their outcomes relative to a small-step operational 
semantics will be observably equal. This implies that any transformation expres- 
sible as a sequence of calculus steps (such as constant propagation and folding, 
function inlining, and many others) is meaning preserving. 

Second, our technique for proving soundness is interesting in its own right. 
Traditional techniques for showing this property (e.g., [Plo75,AF97]) require the 
language to be confluent, but the recursive nature of module bindings destroys 
confluence. In order to show that our module calculus has soundness, we intro- 
duce a new technique for proving this property based on a weaker notion we 
call confluence with respect to evaluation. We replace the confluence and stan- 
dardization of the traditional technique for proving soundness with symmetric 
properties we call lift and project. 

Third, we sketch a simple model of link-time compilation and introduce the 
weak distributivity property as one way to find candidates for link-time opti- 
mizations. We show that module transformations satisfying certain conditions 
are weakly distributive, and demonstrate these conditions for some examples of 
meaning preserving transformations. 

Our work follows a long tradition of using untyped calculi for reasoning about 
programming languages features: e.g., call-by-name vs. call-by-value semantics 
[Plo75], call-by-need semantics [AFM+95,AF97], state and control [FH92], and 
sharing and cycles [AK97,AB97]. Our notion of confluence with respect to eva- 
luation avoids cyclic substitutions in the operational semantics, and so is related 
to the acyclic substitution restriction of Ariola and Klop [AK97]. 

This work is part of a renewed interest in linking issues that was inspired by 
Cardelli’s call to arms [Car97]. Recent work on module systems and linking has 
focused on such issues as: sophisticated type systems for modules [HL94,Ler94]; 
the expressiveness of modules systems (e.g., handling features like recursive mo- 
dules [FF98,CHP97,AZ99], inheritance and mixins [DS96,AZ99] and dynamic 
linking [FF98,WV99]); binary compatibility in the context of program modifica- 
tions [SA93,DEW99]; and modularizing module systems [Ler96,AZ99]. There 
has been relatively little focus on issues related to link-time optimization; ex- 
ceptions are [Fer95] and recent work on just-in-time compilers (e.g, [PC97]). 

Our work stands out from other work on modules in two important respects. 
First, we partition the reduction relation of the calculus (— >■) into evaluation 
(sometimes called standard) steps (=J>) that define a small-step operational se- 

^ We will often abbreviate the name of this property as “soundness” . 




262 



E. Machkasova and F.A. Turbak 



mantics and non- evaluation {non-standard) steps (^). While this partitioning is 
common in the calculus world (e.g., [Plo75,FH92,AF97]), it is rare in the module 
world. Typical work on modules (e.g., [Car97,AZ99]) gives only an operational 
semantics for modules. Yet in the context of link-time compilation, the notion 
of reduction in a calculus is essential for justifying meaning preserving program 
transformations. Without non-evaluation steps, even simple transformations like 
transforming [F >->• Ax.(l -I- 2)] to [F >->• Ax. 3] or [A i-^ 4, F >->• \x.x-\-A\ to 
[A I— >■ 4, F I— >■ \x.x -\- 4] are difficult to prove meaning preserving. 

Second, unlike most recent work on modules (with the notable exception of 
[WV99]), our work considers only an untyped module language. There are several 
reasons for this. First, types are orthogonal to our focus on computational so- 
undness and weak distributivity; types would only complicate the presentation. 
Second, introducing types often requires imposing restrictions that we would 
like to avoid. For example, to add types to their system, [AZ99] need to impose 
several restrictions on their untyped language: no components with recursive 
types, and no modules as components to other modules. Finally, we do not yet 
have anything new to say in the type dimension. We believe that it is straightfor- 
ward to adapt an existing simple module type system (e.g., [Car97,FF98,AZ99]) 
to our calculus. On the other hand, we think that enriching our module system 
with polymorphic types is a very interesting avenue for future exploration. 

Due to space limitations, our presentation is necessarily dense and telegra- 
phic. Please see the companion technical report [MTOO] for a more detailed 
exposition with additional explanatory text, more examples, and proofs. 

2 The Module Calculus 

In this section, we present a stratified calculus with three levels: a term calculus 
T, a core module calculus C, and a full module calculus F . The three calculi 
are summarized in Fig. 1. Let X range over {F, C,F}. The definition for each 
calculus X consists of the following: 

— The syntax for calculus terms Term;^^ and for general one-hole contexts 
Context;v^. If X G Context ;t, then X{Y} denotes the result of filling the 
hole of X with a term Y . Due to the hierarchical structure of our module 
calculus, Y is not necessarily a term of X . For instance, in our hierarchy F 
contexts are filled with F terms; C contexts are filled with F terms; and T 
contexts are filled with either C or F terms. We assume that the notation 
X{Y} is only applied to such X and Y that the result of the filling is a well- 
formed term in Term;^. For instance, the notation ID){M} is defined only if 
the resulting module is well-defined element of Termc . 

— A small-step operational semantics of X defined via an evaluation step rela- 
tion and a complementary definition of a non-evaluation step relation 

For each of the three calculi we define a one-step calculus relation 
-^x'^^^x U '^x-^ The relation ^x is often defined in terms of an evalua- 

^ Alternatively we could have defined the rules for -^x explicitly and then set to 
be -^x \ However, giving explicit rules for ^x clarifies the presentation. 
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Syntax for the Term Calculus (T): 

c € Const = constant values x £ Variable = term variables 

V £ Visible = external labels h £ Hidden = internal labels 

k, I £ Label = Visible U Hidden 

L, M, N £ Termr c | a; | Z | (Xx.M) \ Mi @ M2 | Mi op M2 
C G Contextr ::= □ | (Ar.C) | C @ M | M @ C | C op M | M op C 

V G Valuer c | r | Xx.M 

Notion of Reduction on Terms: 

{Xx.M @ V) M[x := V] (f3) 

Cl op C 2 c, where c = (5(op, ci, C 2 ) (5) 

Evaluation and Non-evaluation Steps: 

E G EvalContextr ::= □ | E @ M | (Xx.M) @ E | E op M | c op E 
E{R} ^r where R Q, (term-ev) 

E{i?} ^r E{Q}i where R ~^r Q- (term-nev) 



Syntax for the Core Module Calculus (C): 

D £ Termc ::= [h Mi , . . . ,l„ M„] (abbreviated [h A Mi]), 

i=l 

provided h = Ij implies i = j, FV (D) — 0, and Imports{D) n Hidden = 0. 

D G Contextc ;:= Mi, Ik ^ A Mj\ 

i=l j = k-\-l 

Projection Notation: [U A Mi] \.lj = Mj, if 1 < J < u, and otherwise undefined. 

i = l 

Evaluation and Non-evaluation Steps: 



G G EvalContextc 



[liW Mi,lk=E,lj A 

i=l j=k+l 



Mi 



G{R} G{<5}, where R Q- 

G{Z} G{V}, where G{1} il = V. 
[li A Mi,hj A Vj] =^c [U A Mi], where Vi<i<,„. hi 

i = l i = l i=l 

G{R} G{<5}, where R Q- 

G{Z} G{V}, where G{1} il = V. 



^ U"=iFL(M,0 



(comp-ev) 

(subst-ev) 

(GC) 

(comp-nev) 

(subst-nev) 



Syntax for the Full Module Calculus (F): 

F £ Termr ::= F j / j Fi © F 2 j F[l £- I'] j let I = Fi in F 2 
F G Contextr ::=0|E©F|F©F| F[Z t— I'] j let / = F in F j let / = F in F 



Evaluation and Non-evaluation Steps: 

D ^r D' , where D D' (mod-ev) 

F{[A:i A Mi] © [Ij A V,-]} ^r F{[fci A Mi,lj A V,]}, (link) 

i = l J = 1 i=l J = 1 

where (uAife) C (uAi^j) = 0 

F{F)[Z t— fc]} =^r F{F[Z := A:]}, (rename) 

where I G BL{D) implies k ^ BL{D), 

I £ Hidden implies k £ Hidden, and 
k £ Hidden implies I 0 Imports{D). 

F{let 1 = Fi in F 2 } ^r F{Fi[l := F 2 ]}, (let) 

F{F} ^r P{F'}, where D — D' (mod-nev) 

and F □ or F) D' 



Fig. 1. The three levels of the module calculus. 
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tion context EvalContext;^^ C Context A term F is a x -'normal- form 
(NF) if there is no term N s.t. M -^x N, a ^x~NF is defined analogously. 
For each calculus X, there is a classification function Clx that maps each 
term to a “class” token that describes its state w.r.t. evaluation. The classes 
for evaluatable terms must be disjoint from those in = 4 >at-NF. Also associated 
with each calculus T is a set Value;^ of values that is the union of one or 
more classes of =J>at-NFs. The function Outcomex of a term is defined to be 
the class of its =^>-normal form or a symbol T if the term diverges. 

We use the following notations and conventions. If X ranges over EvalContext;^^, 
then X ranges over Context EvalContext;^ (he. the set of non-evaluation 
contexts). For pairs of rules such as (comp-ev) and (comp-nev), which only dif- 
fer by the use of an evaluation versus a non-evaluation context, we introduce a 
notation for the combined calculus rule. For instance, we say that D -^c D' by 
the rule (comp) if either D D' by (comp-ev) or D D' by (comp-nev). If 
— >■ is a one-step relation, then — >■* denotes its reflexive transitive closure, and O 
denotes its reflexive, symmetric, and transitive closure. 

The following properties of calculi are important in the sequel. 

Definition 1 (Confluence). The — >■ relation is confluent if Mi — >■* M2 and 
Ml — >■* M3 implies the existence 0/M4 s.t. M2 — >■* M4, M3 — >■* M4. A calculus 
X has confluence if^x 'Is confluent. 

Definition 2 (Standardization). A calculus X has the standardization pro- 
perty if for any sequence Mi — M2 there exists M3 s.t. Mi M^'^* xM2. 

2.1 Term Calculus (7~) 

The module calculus is built on top of a term calculus T, a typical call-by-value 
A-calculus that includes constants (which we assume include integers) and binary 
operators (we assume o'p includes standard integer operations). For interfacing 
with the module language in which it is embedded, the term syntax also includes 
two disjoint classes of labels whose union. Label, is itself disjoint from Variable. 

We adopt the convention that all A-bound variables in a term must be di- 
stinct. The free variables of a term M, written FV{M), are defined as usual 
(recall that variables are distinct from labels). The set of labels appearing in 
a term M is written FL(M); because labels cannot be A-bound, they always 
appear “free” . The result of a capture-avoiding substitution of M' for x in M is 
written M[x := M']. In addition to using a-renaming to avoid variable capture 
during substitution, it may be necessary to a-rename the result of substitution 
to maintain the distinct variable naming invariant. The result of substituting a 
term M' for a label / in M is written M[l := M']. 

Both =^>7- and ^7- are defined via a redex/contractum relation specified 
by a call- by- value fl rule and a 6 rule (unspecified) for binary functions on 
constants. Terms in dom{-^'f) are called term redexes. The relations =^>7- and 
^7- are contextual closures of with respect to an evaluation context E and 
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a non-evaluation context E. It is easy to see that — >- 7 - (defined as =^> 7 - U ^r) is 
the contextual closure of with respect to a general context C. 

A term M can be uniquely classified with respect to evaluation via 
defined as: 

const(c) if M = c abs if M = Xx.N evaluatable if M = E{i?} 

var if M = a; stuck(?) if M = E{^} error otherwise 

It turns out that an evaluatable term M can be uniquely parsed into E and R 
such that M = E{i?}, so =^> 7 - is deterministic (i.e., it is a partial function rather 
than a relation). The partial function Eval'j-(M) is defined as the = 1 > 7 --NF of M 
if it exists; otherwise, M is said to diverge. The total function Outcome-j-{M) is 
defined as Cl'j-{Eval'j-{M)) if Eval-r{M) is defined, and T if M diverges. Using 
classical techniques [Plo75,Bar84], it is straightforward to prove that — >- 7 - is 
confluent, and E has the standardization property. 



2.2 Core Module Calculus (C) 

In our module calculus, modules are unordered collections of labeled terms. 
There are two disjoint classes of labels: visible and hidden. Visible labels name 
components to be exported to other modules, and also name import sites within 
a component, while hidden labels name components that can only be referen- 
ced within the module itself. (This distinction is similar to distinction between 
deferred variables and expression names on one hand and local variables on the 
other in [AZ99]). Intuitively, a module is a fragment of a recursively scoped re- 
cord that can be dynamically constructed by linking, where visible labels serve 
to “wire” the definitions in one module to the uses in another. 

A module binding is written I 1 — >■ M. A module is a bracketed set of such 
bindings in which the labels of any two bindings are distinct. Note that a hole 
in a module context D is filled with a T-term rather than another module. The 
notation U A- stands for the bindings li >->■ Mi . . .In and D j, I 

i—1 

extracts the component M bound to I in D (if it exists) . 

Suppose that D = [U A Mi]. The free variables of D are FV{D) = 

i—1 

idf^iFV(Mi). The substitution D[l := k] yields [li A Mi[l := k]], where l[ = k 

i—1 

a li = l and I'i = li otherwise. The set of bound labels in D is defined as 
BL(D) = U^^ik, while the set of free labels is EL{D) = (U(U^FL(Mi))/BL(£)). 
The exported labels of D are those that are both bound and visible {Exports(D) = 
BL(B)n Visible), while the imported labels are just the free ones {Imports{D) = 
EL{D)). In order to be well-formed, a module D must satisfy three conditions: 
(1) all its bound labels must be distinct; (2) it must not import any hidden labels; 
and (3) it must not contain any free variables (such variables would necessarily 
be unbound) . In a well- formed module, the hidden labels are necessarily bound, 
so we define Hid{D) = BL{D) fl Hidden. 

The evaluation relation is defined using a module evaluation context G 
which lifts term- level evaluation context E to the module level. The three rules 
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of =^>c allow the following reductions: (comp-ev) lifts =^> 7 - to the module level; 
(subst-ev) substitutes a labeled value for a label occurrence in the module; (GC) 
garbage collects hidden values not referenced elsewhere in the module. Unlike 
=^> 7 -, is not deterministic, because it can perform an evaluation step on any 
component. Nevertheless, is confluent. The complementary relation has 
two rules (comp-nev) and (subst-nev) which differ from their evaluation analogs 
by using a non-evaluation context in place of an evaluation context. Note that the 
(GC) rule does not have a non-evaluation counterpart; i.e., all (GC)-reductions 
are evaluation steps. 

Let us consider some examples of module reductions.^ Any one-step reduc- 
tion on a term component can be lifted to the module via the (comp) rule: 
[F !->• Ax.l -I- 2] [A !->■ =Ax.3]. This is a non-evaluation step, since the redex 
occurs under a A. As an example of (subst), consider [A >->• 4, F >->• Al -|- 3] =Ap 
[A I— >■ 4,F I— >■ 4-1- 3]. Here A in the second term appears in an evaluation con- 
text. Note that a value may be substituted into itself: [F >->• Ax.F] 

Aa;.(Aa;i.F)] [A !->■ Aa;.(Axi.(Aa; 2 .(Aa; 3 .F)))] (where a-renaming preserves the 
distinct variable invariant). This is a non-evaluation step, since F appears under 
a A. The (GC) rule garbage collects hidden values not referenced elsewhere in 
the module. Consider: 

[F I— >■ Xw.g @ {w + 1), f Xx.h, g 1 — >■ Xy.y * 2, 1 — >■ Xz.f] 

=^C [P Xw.g @ (w + 1), g 1 -^ Xy.y * 2] 

The mutually recursive bindings for / and h can be removed because all refe- 
rences to these hidden labels occur inside of the values named by these labels. 
However, g cannot be removed, since an exported term references it. 

It turns out that C has the standardization property. But interestingly, even 
though =>c is confluent, -^c is not confluent, due to the possibility of mutually 
recursive (subst) redexes that appear under a A and therefore not in an evaluation 
context. Consider an example due to [AK97]: Fq = [F >->• Xx.G,G >->■ Xy.F]. 
Then Dq [F Xx.Xy'.F, G >->■ Xy.F] = Di and Dq [F !->■ Xx.G, G >->■ 
Xy.Xx'.G] = D 2 . Di (resp. D 2 ) has an even (resp. odd) number of As for F and 
an odd (resp. even) number for G, and in every reduction sequence starting with 
D\ (resp. D 2 ), all terms will have this property. Clearly, reduction sequences 
starting at Di and D 2 can never meet at a common term. 

The confluence of =Aq gives rise to a partial function Evalc{D) that, when de- 
fined, returns a module whose components are all ^ 7 --normal forms. The classifi- 
cation notion also lifts to the module level: Clc{D) = [li A- Cl-r{Mi)], where D = 

i—1 

[k A Mi]. As in the term calculus, Outcomec(D) = Clc{Evalc{D)) if Evalc{D) 

i—1 

exists, and T otherwise. We say that D = [li A Vi] is a module value {D G 
Valuec) if Hid{D) = 0 and Vi G ValueT- for all 1 < i < n. 



® In examples, we adopt the convention that visible labels have uppercase names 
while hidden labels have lowercase names. 
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2.3 Full Module Calculus (iF) 

The full module calculus extends the core module calculus with three module 
operators: linking, renaming, and binding. Intuitively, the linking of modules D\ 
and D 2 , written D\ © D 2 , takes the union of their bindings. To avoid naming 
conflicts between both visible and hidden labels, BL{Di) and BL{D 2 ) must be 
disjoint. The fact that the import labels of a well-formed module may not be hid- 
den prevents the components of one module from accessing hidden components 
of another when they are linked. 

The renaming operator renames any module label (visible or hidden, import 
or export). Renaming import and export labels is the way to connect an exported 
component of one module to an import site in another. Renaming a visible 
label to a fresh hidden label hides a component; a user-level “hiding” operator 
could be provided as syntactic sugar for such renaming. Finally, renaming of 
hidden variables to other hidden variables is necessary to guarantee that hidden 
variables are disjoint when they are linked. The side conditions on renaming 
prevents certain undesirable scenarios: (1) attempting to rename one bound label 
to another (causing a name clash); (2) renaming a hidden variable to a visible 
one, thereby exposing it; and (3) renaming a (necessarily visible) import to a 
hidden label, thereby making the module ill-formed. 

The binding operator let / = Fi in F 2 names the (result of evaluating the) 
definition term F\ and uses the name within the body term F 2 . This models 
situations in which the same module is used multiple times in different contexts. 

The disjoint hidden label requirement for © simplifies reasoning about the 
calculus, but is severe from the perspective of a user, who should not be able 
to predict the names of the hidden labels of any module. We address this pro- 
blem by supplying a user-level linking operator © that can be defined in terms 
of the primitive linking operator © and renaming, as follows. Suppose that 




fined as: 



Fi[h\ <— h'(, . . . , hn^ hn-i"\ © F2[h\ ^ . . . , hn^+nj']^ 

where ((U”ii h^) U h/)) R h^") = 0 

The hidden labels of Fi and F 2 are renamed to fresh hidden labels before the 
modules are linked to avoid collisions. The renaming performed by © is is similar 
to the a-renaming required in other module calculi linking operations (e.g., in 
[FF98] when rewriting the compound linking form to the unit module form). 

The definition of — lifts core module reduction steps to the module expres- 
sion level and adds evaluation rules for the link-level operators (link, rename, 
and bind) . The structure of Context jr allows the link-level operators to be eva- 
luated in any order. The lifted core module reduction steps are only considered 
evaluation steps if they are not surrounded by any link-level operators; this for- 
ces all link-level steps to be performed first in a “link-time stage” , followed by a 
“run-time stage” of core module steps. 

The lack of confluence of -^c is inherited by — but we are still able to 
show that is confluent and T has the standardization property. If F is 
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a link, rename, or bind term, we define Clj^{F) to be linkable; otherwise we 
define Clj^{F) to be Clc{F) (in this case, F G Termc)- Outcome^r is defined 
analogously with Outcomec, and Value^r = Valuec. 

3 Meaning Preservation 

The calculus defined in the previous section allows us to reason about module 
transformations. A transformation T of a calculus T is a relation T : X x X. 
Even though T in general is not a function, we sometimes write Z = T(Y) if 
(Y, Z) G T. Below we define a notion of observational equivalence and, based on 
it, a notion of a meaning preserving transformation. 

Definition 3 (Observational Equivalence). Two terms Y and Z of a cal- 
culus X' are observationally equivalent in a calculus X (written Y =x Z) if for 
all contexts X s.t. X{Y} and X{Z} are well-formed terms of X, X{Y} W 
ijfX{Z} where W and W G Value;^ and Clx{W) = Clx{W'). 

In the definition, note that X may or may not be the same as X' . As an 
example, two core modules are observationally equivalent in T if in any full 
module context F they evaluate to module values of the same class, as defined 
above. For instance, consider the following modules: Di = [E i— \x.x-\- a,a i— >■ 
1 + 2], D{ = [F ^ \x.x + 3, a 3], D2 = [S ^ Ni + N2], and D'2 = [S ^ 
N2 -\- Ni], D\ =j^ D'l because the exported F behaves like an “add 3” function 
for both modules in any context. Assuming that + is commutative, D2 =j= D'2 
because they evaluate to the same module value when they are placed in a 
context that supplies integer values for Ni and N2, and none of the two modules 
evaluates to a module value if the context does not supply such values. 

Definition 4 (Meaning Preservation). A transformation T of a calculus X' 
is meaning preserving in a calculus X if (Y, Z) G T implies Y =x Z. 

For instance, the constant folding/propagation transformation CFP in C is 
meaning preserving in F, as seen in the above example with D\ and The 
example with D2 and D'2 illustrates that a transformation SPO that swaps the 
operands of + in C is also meaning preserving in T . 

3.1 Computational Soundness 

Proving that a transformation is meaning preserving can be difficult and tedious 
work. However, if T is a calculus-based transformation in X, i.e. Y ggx Z for 
all (Y, Z) G T, then it is automatically meaning preserving in a calculus X' 
satisfying the conditions of Lemma 1 below. 

A key notion for showing the meaning preservation of calculus-based trans- 
formations is computational soundness: 

Definition 5 (Computational Soundness). A calculus X is computationally 
sound if M ggx N implies Outcomex{M) = Outcomex(X), where M,N G 

Term;r • 
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It follows from computational soundness that if two terms are equivalent in 
the calculus then they are observationally equivalent in an empty context. For 
observationally equivalence to hold in all contexts requires embedding'. 

Definition 6 (Embedding). A relation -^x' embedded in a relation ->-x 
(written -^x'^^x) if Y ~^x' Z implies that X{y} -^x^{Z} for any context 
X s.t. X{y} and X{Z} are well-formed terms of X. 

As examples of embeddings, in our module calculus, — >-7-^— fc (because term 
reductions can be performed in the bindings of a module) and — (because 
core module reductions can be performed within a full module term) . The self- 
embedding -^xA^x means that the relation -^x is a congruence relative to 
the one-holed contexts of X. For instance, — >-7- and — >-7^ are both congruences 
since they are embedded in themselves. 

Together, computational soundness and embedding imply that calculus-based 
transformations are meaning preserving. 

Lemma 1. If a calculus X is sound and -^x'^^x, then any calculus-based 
transformation T in X' is meaning preserving in X . 

Proof. By Definition 6, Y *^x' T(Y) implies that for any context X, X{F} 
0;rX{T(y)}. Then OutcomeAr(X{F}) = OutcomeAr(X{T(y)}) by soundness of 
X. By the definition of Outcomex , X{F} W iff X{T(F)} IF', where IF 
and IF' G=J>a'-NF and Clx{W) = Clx{W'). Since Value respects the ordering 
of Clx, IF and IF' are either both in or both not in Value;r- □ 

The soundness of the call-by-name and call-by-value A-calculi are a classic 
result due to Plotkin [Plo75]. Since the reduction relations of these caculi are 
congruences (i.e., are self-embedded). Lemma 1 implies that all calculus-based 
transformations in these calculi are meaning preserving. 

A main result of our work is that Y, C, and T are all computationally sound. 
Given the four embeddings for these calculi enumerated above. Lemma 1 implies 
that calculus-based transformations are meaning preserving in each of the four 
cases. Many classic program transformations (both at the term and at the mo- 
dule level) fall into this category: e.g., constant folding and propagation, function 
inlining, and simple forms of dead-code elimination that eliminate unused value 
bindings. All of these (and any combinations thereof) can easily be shown to be 
meaning preserving because all are justified by simple calculus steps. 

We emphasize that there are numerous common transformations that are not 
calculus-based and so their meaning preservation cannot be shown via this tech- 
nique. The operand-swapping SPO transformation introduced above is in this 
category. Note that Outcomec{D 2 ) = [S' >->■ stuck(Vi)] and OutcomedD^) = 
[S' !->■ stuck(V2)], underscoring that SPO cannot possibly be expressed via calcu- 
lus steps. Global transformations like closure conversion, assignment conversion, 
uncurrying, etc., are other examples of non-calculus-based transformations. 
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3.2 A Novel Technique for Proving Soundness 

As in Plotkin’s approach, we show soundness of the module calculi in order to 
prove that calculus-based transformations are meaning preserving. However, we 
formulate and prove much more general conditions for soundness that do not 
depend on the particulars of the module calculus or of the definition of a pro- 
gram outcome. We also extend traditionally used definitions to a hierarchy of 
calculi, allowing terms of one calculus to fill in contexts of another (see Defini- 
tion 3 above). Our discussion is independent of the particulars of a calculus. The 
notations M, N for terms and C for contexts are used below for clarity (since 
these notations are more traditional); note that they are independent from the 
same notations used in the term calculus T. 

Traditional proofs of computational soundness depend on confluence of reduc- 
tion in the calculus and on standardization, as well as on the following property, 
which is often not articulated, but plays a critical role in soundness proofs: 

Definition 7 (Class Preservation). Calculus X has the class preservation 
property if M N implies Clx{M) = Clx{N), where M,N G Term;f 

Below we present a traditional proof of computational soundness that gene- 
ralizes Plotkin’s approach. 

Theorem 1 (Soundness of a Confiuent Calculus). Confluence, standar- 
dization, and class preservation imply soundness. 

Proof. The diagram of the proof is shown in Fig. 2.'^ Assume that M ggx N 
and that M =^* M' = Eval{M). By confluence there exists L s.t. M' — >■* L, 
N — >■* L. Since M' is a normal form w.r.t. =J>, there can not be an evaluation 
sequence starting at M' , so M'^*L. By standardization, N — >■* L implies that 
there is N' s.t. N =^* N'^*L. Since M',L, and N' are connected only by 
by class preservation, Cl(M') = C1(L) = C1{N'), and since N' is of the same 
class as M', it must also be a normal form w.r.t. =J>, so N' = Eval{N). 

Now assume that M diverges. If Eval{N) exists, then by the above argument 
we can show that Eval{M) exists as well. So if M diverges, then so does N. □ 

The above approach does not work for a calculus that lacks confluence. But it 
turns out that general confluence is not required for soundness! Since the outcome 
of a term is defined via the evaluation reduction, we can instead use a weaker 
form of confluence: confluence with respect to evaluation. The two properties 
given below that we call lift and project (see also Fig. 3), together with the class 
preservation property, are sufficient to show soundness. 

Definition 8 (Lift). A calculus has the lift property if for any reduction se- 
quence M ^ N =^* N' there exists a sequence M M'^*N' . 



In Figs. 2 and 3, double-headed arrows denote reflexive, transitive closures of the 
respective relations, and a line with arrows on both ends denotes the reflexive, sym- 
metric, transitive closure of the respective relation. 
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M' = Eval{M) 



Evalnation step 



N ** L C1{M') = C1{L) = C1{N') o-^ Non-evaluation step 
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❖ / 

/ 

N' N' = Eval{N) Calculus step 



Fig. 2. Sketch of the traditional proof of computational soundness. 
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Fig. 3. The lift and project properties. 



Definition 9 (Project). A calculus has the project property if M ^ N , M =^* 
M' implies that there exist terms M” , N' s.t. M' =l>* M” , N =^* N' , and 

The project property is the formalization of the notion of confluence w.r.t. 
evaluation mentioned above. It says that an evaluation step and a non-evaluation 
step leaving the same term can always be brought back together. The lift pro- 
perty is equivalent to standardization: any reduction sequence can be transfor- 
med into a standard sequence by pushing “backwards” sequences of evaluation 
steps through single non-evaluation steps. There is a benefit in proving standar- 
dization using the lift property (rather than directly) : proofs of both the lift and 
project properties use the same mechanism (certain properties of residuals and 
finite developments [Bar84]) and share several intermediate results. 

The following theorem embodies our new approach to proving soundness: 

Theorem 2. Suppose that is confluent. Then lift, project, and class preser- 
vation imply soundness. 

Proof. We want to show that if M o TV, then Outcome(M) = Outcome{N) . 
Without loss of generality assume that M and N are connected by a single step. 
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Assume that Outcome{M) ^ _L. Let M' = Eval{M). In all four of the following 
cases, Outcome{M) = Outcome{N): 

— M ^ N. By the project property, M =J>* M' implies that there exist M" , N' 

s.t. M' M", N N', and But M' is a normal form w.r.t. 

so M' = M” . By the class preservation property Cl(M') = C1{N'), so N' is 
also a normal form. Hence N' = Eval{N), and Outcome{M) = Outcome{N) . 

— N M . Similar to the previous case by the lift property. 

~ M ^ N. By confluence of there exists N' s.t. N =^* N', M' =^* N' . But 
M' is a normal form, so N =^* M' = Eval{N). 

— iV M. Then by transitivity of =>*, N =>* M' = Eval{N). 

Now let Outcoine(M) = _L. Assuming Outcome{N) ^ _L, by the above argu- 
ment Outcome{M) = Outcome{N) ^ _L, and we get a contradiction. □ 

C and T satisfy the lift, project, and class preservation properties, so they 
enjoy the soundness property. For the technical details, consult [MTOO]. 

4 Weak Distributivity 

We say that a module transformation T is weakly distributive if and only if 
T{D\ © £>2) = T{T{Di) © £(£>2)), where = is syntactic equality (modulo a- 
renaming and module binding order). 

Let Tiink be a single module transformation performing all link-time optimiza- 
tions. Suppose that the translator from source modules to intermediate modules 
is given by s2i{D) = Tunk{D)^. Also suppose that the linking operator on inter- 
mediate modules is defined as £>i ©unk ^2 = Tunk{Di © £>2). Then if Tunk is we- 
akly distributive, we have that s2i{Di)(B\inks2i{D2) = T’iink(Tlink(F'i)©Tlink(-D2)) 
= Tiink{Di © £>2) = s2i{Di © £>2). Thus, compiling a “link tree” of modules in 
the link-time compilation model gives exactly the same code as compilation in 
whole-program model. This is the sense in which weakly distributive transfor- 
mations are promising candidates for link-time optimizations. 

Here we briefly discuss two classes of weakly distributive module transforma- 
tions T. We assume the following about T: ( 1 ) it is strongly normalizing; and ( 2 ), 
if T can be applied to a module [Xi A- M^] , then it can be applied to a module 

i—1 

[Xi A Mi,Yj A Ab], i.e. to the same module with extra bindings. To moti- 

i=l j=l 

vate the second assumption, let FI he function inlining on modules restricted 
to non-recursive substitutions (so that the first assumption is satisfied). Con- 
sider the following inlining/linking sequence: [X 1— >■ Xw.Y, Z 1— >■ Xx.X] © [F 1— >■ 

•pT 

Xy.Z] A [X Xw.Y,Z ^ Xx.Xw'.Y] © [F Xy.Z] [X Xw.Y,Z ^ 

PT 

Xx.Xw'.Y,Y I— >■ Xy.Z] — >■ [X 1— >■ Xw.Xy.Z , Z 1— >■ Xx.Xw' .Y ,Y 1— >■ Xy.Z], On the 
other hand, linking first gives: [X >->• Xw.Y,Z >->• Xx.X,Y >->• Xy.Z], and at 
this point the cycle becomes apparent, and no inlining is possible. Thus, extra 
bindings can prevent weak distributivity by blocking the transformation. 

® For simplicity, we assume the source and intermediate languages are the same. 
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A simple class of weakly distributive transformations are those satisfying two 
conditions: (1) idempotence: T{T{D)) = T{D); and (2) (strong) distributivity 
over ©: T{Di © D 2 ) = T{Di) © T(I? 2 )- K is easy to show that such a T is 
weakly distributive. Examples include many combinations of intra-term trans- 
formations, such as constant folding/propagation, dead code elimination, and 
function inlining (restricted to non-recursive cases) . Note that the second condi- 
tion implies that the transformation independently transforms the components 
of a module; i.e., the transformation cannot use the (subst) or (GC) rule. 

Closures of confluent transformations T form another class of weakly dis- 
tributive transformations. It is possible to simulate any transformation step in 
T{T{Di) © T{D 2 )) by a corresponding step in T{Di © D 2 ). Using confluence, 
strong normalization, and the extra-bindings assumption, it can be shown that 
the two expressions transform to the same result. For example, constant fol- 
ding/propagation at the module level (i.e., including the (subst) rule) has all of 
these properties, and so is weakly distributive. 

5 Future Work 

There are several directions in which we plan to extend the work presented here. 

Types: We are exploring several type systems for our module calculus, es- 
pecially ones which express polymorphism via intersection and union types. 
These have intriguing properties for modular analysis and link-time compila- 
tion [Jim96,Ban97,KW99]. 

Non-local Transformations: So far, we have only considered meaning preser- 
vation and weak distributivity in the context of simple local transformations. 
We are investigating global transformations like closure conversion, uncurrying, 
and useless variable elimination in the context of link-time compilation. 

Weakening Weak Distributivity: Weak distributivity requires the rather strong 
condition of synactic equality between T(Di©D 2 ) and T(T(Di) ©T(Z? 2 )). Wea- 
ker notions of equality may also be suitable. Note that “has the same meaning 
as” is too weak, since it does not capture the pragmatic relationship between the 
two sides; they should have “about the same efficiency” . 

Abstracting over the Base Language: Our framework assumes that the module 
calculus is built upon a particular base calculus. Inspired by [AZ99], we would 
like to parameterize our module calculus over any base calculus. 

Pragmatics: We plan to empirically evaluate if link-time compilation can give 
reasonable “bang for the buck” in the context of a simple prototype compiler. 
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Abstract. In order to deal efficiently with inhnite regular trees (or other 
pointed graph structures), we give new algorithms to store such structu- 
res. The trees are stored in such a way that their representation is unique 
and shares as much as possible. This maximal sharing allows substan- 
tial memory gain and speed up. For example, equality testing becomes 
constant time. The algorithms are incremental, and as such allow good 
reactive behavior. This new algorithms are then applied to the represen- 
tation of sets of trees. The expressive power of this new representation 
is exactly what is needed by set-based analysis. 



1 Introduction 

When applying set-based analysis techniques for practical applications, one is 
surprised to see that the representation of the sets of trees is not very efficient. 
Even when we use tree automata, we cannot overcome this problem without 
performing a minimization of the whole automaton at each step. We propose a 
new way of dealing with this kind of structure to get a representation that is as 
small as possible during the computation. 

After analysis of the problem, it appears that the underlying structure we 
want to optimize can be described mathematically as regular infinite trees. Be- 
cause tree structures appear everywhere in computer science where a hierarchy 
occurs, we found it interesting to present the algorithms in an independent way. 
In this way, our technique appears as an extension of an efficient solution to 
store finite trees. 

The representation we extend uses just the minimum amount of memory by 
sharing equivalent subtrees. This saves a lot of space. It is used, for example, 
with sets of words represented as a tree to share common prefixes. It is possible 
to share the subtrees incrementally, and at the same time to give a unique 
representation to different versions of the same trees. Such a technique allows 
constant time equality testing and a great speed up for many other algorithms 
manipulating trees. It has been the source of the success of Binary Decision 
Diagrams (BDDs) [2], which are considered the best representation for boolean 
functions so far. 

But as soon as a loop occurs somewhere in the data, finite tree techniques are 
no longer adequate. The main contribution of this article is to extend the good 
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results of unique sharing representation from finite trees to infinite trees. These 
techniques are applied to the representation of sets of trees in set-based analysis, 
but they can also be applied directly to the representation and manipulation of 
finite automata, or infinite boolean functions [14]. 

After a recollection of the classic results over finite trees in section 2, we 
present the solutions for the most difficult problems with infinite trees in the 
section 3 on cycles. The general problem is then treated in section 4, with a 
full example. Complexity issues and algorithms to manipulate infinite trees are 
discussed in section 5. The application to sets of trees implies the description of a 
new encoding to keep the uniqueness of the representation. This new contribution 
is described in section 6. 



2 Classic Representation of Trees 

2.1 Trees and Graphs 

As we deal with the computer representation of data structures, we must give a 
clear meaning to the word representation, and in particular clearly distinguish 
between what is represented and what is the representation. For this reason, we 
will give a mathematical definition of what is a tree, and another one for the 
way it is usually stored in a computer. 

Let IN* be the set of words over IN, e denoting the empty word. We note -< 
the prefix ordering on words and u.v the concatenation of the words u and v. 
Let F be a finite set of labels. 

Definition 1. A tree t labeled by F is a function of posff) — >■ F such that 
posft) C IN* and Vp G IN*, Vf G IN, p.i€pos{t) {p€pos{t) and V j <i, p.j G 
pos{t)) 

Let p G posff). The subtree of t in p, written t[p] is defined by: pos{t[pf) 

{q G IN* I p.q G pos(f)}, and f[p](g) =* t{p.q). A tree is uniquely determined by 
the label of its root, t{e), and by the children of the root, the different t[ij, z G IN. 

/ 

In the sequel, a generic tree will be denoted / N , where / is the label of the 

to ^n — 1 

root, and {ti)i<^„ are the children of the root. 

When representing a tree in a computer, we usually use one computer location 
for each position p in posff), where we store the label t(p) and the location of 
the different children (the p.z’s in pos(p)) of this position. Such a representation 
is well modeled by a graph, where each node of the graph corresponds to a 
computer location. We do not give the most general definition of graphs, but the 
definition that is useful in this article to represent trees. 

Definition 2. A graph G labeled by F is composed of two sets, the node set, 
, and the edge set, G^ C G^ x G^ x IN, and every node of the graph is 
associated with a label in F. 
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We define the notion of path in a graph: let p £ IN*, p is a path of the node N 
if and only ii p = e ox p = i.q and there is an M G such that {N, M, i) £ 
and <7 is a path of M. If O is the only node at the end of the path, we write 
N.p = O. We define G{N) as the graph defined by the modes which can be 
reached from N. We will often identify a node N and the graph G{N). 

Definition 3. A node N represents a tree t if and only if the set of paths of N 
is pos{t), and Vp G posft), N.p is well defined, and its label is t{p). 

A finite tree t is a tree such that pos{f) is finite. There is always a possible 
representation by a finite graph for finite trees. In the most common use, one 
node corresponds to each path of the finite tree. 

A regular tree t is a tree such that the number of distinct subtrees of t is 
finite. Such a tree can be infinite, but it can still be represented by a finite graph 
[6], see Fig. 1 for an example. 



t = < (10)*0 — >■ a can be represented by 

1 1(01)* ^3 




0 



Fig. 1. An infinite regular tree 



2.2 Best Representation 

The naive representation, which consists in using any graph representing the 
tree [6], is very easy to deal with and quite widely used for small problems. But 
we can do far better if we observe that some nodes can represent different paths 
of the tree, as long as the subtrees at these paths are the same. This is called 
sharing the subtrees (see e.g. [1]). In fact, the best we can do is to have exactly 
one node for each distinct subtree. This is what we call the best representation 
of a tree. In the case of finite trees, this can save a lot of space, and even time by 
memoizing [15], and in the case of infinite regular trees, we avoid the possibility 
of unbounded representation for a given tree. 

When dealing with many trees, we can do even better: considering the entire 
computer memory as one graph, we can optimize the representation for all the 
trees, and have in effect exactly one memory location for each distinct tree we 
need to store. An immediate consequence is that we just have to compare the 
location of the roots (the node representing the trees) to compare entire trees. 
Such a technique is used e.g. in BDDs [2] to achieve impressive speed up and 
memory gain. 

The technique to obtain the best representation of the trees uses a dictionary 
mechanism linking keys to nodes of the graph, usually a hash table. The keys 
are built incrementally: if the keys for the (ti)i<n are known and linked to the 
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/ 

nodes {Ni)i^n, then the key for -I \ is (/, (-/Vi)i<„). Each time a key is not 

to tn-1 

present in the dictionary, it is associated with a new node N, with edges to the 
Ni’s. If we come to a tree whose key is already in the dictionary, we use the 
corresponding node. As the trees are always built from leaves to root, we have 
indeed a best representation for the trees. 



3 Dealing with Cycles 



When representing infinite trees, though, we cannot go from the leaves to the 
root, so we cannot start the key mechanism which leads to the best represen- 
tation. The difficulty lies in the infinite paths of the tree, that is the cycles of 
the graph representing the tree. Whereas in finite trees there is no need to see 
beyond the immediate children of a given node, when dealing with cycles, we 
can have reasons to look further, in order to detect the two causes of cycle un- 
folding: cycle growth and root unfolding. For example, consider the cycle a. b . 



a is an example of cycle growth, and a ^ t o is an example of root 

'f' 



r b 

unfolding. In this very simple example, it is easy to reduce root unfolding by 
looking at the key of the root, but it is much more difficult if the root itself is 
still in another cycle. In order to concentrate on the real difficulties, we suppose 
in this section that we deal with strongly connected graphs, that is graphs such 
that there is a path between any pair of nodes. 



3.1 Cycle Growth and Tree Keys 

We give =tree as the equivalence between nodes representing the same tree. The 
goal of cycle growth reduction is to find an equivalent graph with the minimum 
number of nodes. In such a graph, whatever the nodes N and M, N =tree M 
N = M. Such a problem is called a partitioning problem. It has been solved 
in time nlog(n) by Hopcroft [10] for finite automata, and in the general case 
by [4]. We call share(A^) the algorithm that takes a node N and modifies the 
associated graph so that it has the fewest possible nodes (Fig. 2). 




Fig. 2. Application of the share algorithm. 
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Cycle growth reduction corresponds to the state of the art in automata re- 
presentation. But we want to go further: we need that the representation be 
unique whatever the different versions of the same tree. To perform this, we give 
a key which distinguishes between non isomorphic graphs. This key is associa- 
ted to a given node N of the graph. It is a finite tree which corresponds to the 
graph as long as we do not loop, but as soon as we loop, the label of the node 
is replaced by its access path from N. It is described as treeKey(fV). See Fig. 3 
for an example. The isomorphism between graphs is not the same thing as =tree- 




0 



a 





/V 

e b 

/\ 



and 



b 

/\ 



/\ 



Fig. 3. A graph, followed by the tree keys of its two nodes 



In general it can differentiate two graphs which represent the same tree. The 
interesting point is that it is indeed the same relation on graphs with a minimal 
number of nodes. 

Proposition 1. Whatever M and N, such that Q{M) and G{N) are graphs with 
minimal number of nodes, treeKey{M) = treeKey(N) M =tree N . 

Proof. The difficult point is M =tree N =4> treeKey(M) = treeKey(iV). Suppose 
there are M and N such that G{M) and G{N) are graphs with minimal number of 
nodes, M =tree N and treeKey(M) yf treeKey(A^). Let Im = treeKey(M) and 
tN = treeKey(iV). Because Im tN, there is a path p such that tM{p) ^ t]\[{p). 
But if tM{p) is a label of the graph, tM{p) is the label of M.p, and the same 
holds for N. Because M =tree N, M.p and N.p have the same label, so at least 
one of tM{p) or tAr(p) is not a label of the graphs (and so is in IM*), say tM{p). 
It means there is a g A p such that M.q =tree M.p. So N.q =tree N-P, but by 
minimality of the number of nodes of G{N), N.q and N.p must be the same 
node, and so tAr(p) = q = tM{p)- □ 

Because we can find an equivalent graph with minimal number of nodes 
for strongly connected graphs, we have a valid key mechanism for any strongly 
connected graph: we first apply share, then treeKey. 

3.2 Root Unfolding and Partial Keys 

With just share and treeKey (applied to every node), we can have a unique 
representation that shares common subtrees. But as we need to start the whole 
process from the beginning for each little modification in the trees, such a process 
would be quite slow. Moreover, it is much better to apply the share algorithm on 
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the smallest possible graphs. As it is not a linear algorithm, we have better results 
if we can split the graph and apply the algorithm to each separate subgraph only. 

The finite parts of the tree can always be treated in the classic way, while 
the loops will need a special treatment. In order to decompose the graph and 
mark those parts of the graph which have been definitely treated, we introduce 
partial keys. A partial key looks like a node key for a finite tree, a label followed 
by a vector of nodes, except that for some parts of the vector, there is no node 
(see Sect 4.3 for an example). A partial key k has a name: name(fc) G F and 
is a partial function from IM to nodes. A graph labeled by partial keys is such 
that for every node N in the graph, if k is the partial key for N, the edges in 
the graph correspond to those integers for which the partial key is not defined. 
For example, if a node is labeled by / of arity 3, we can have a partial key 
which is not defined on 0 and 1 (we write a •), and on 2 its value is the node 
number 4. We write (/, ••□4) for this partial key. The only edges that can leave 
from such a node would be labeled by 0 and 1. The idea is that what is in the 
partial keys is uniquely represented. In our example, the node number 4, ^4, 
is a unique representation of some tree. Later on during the computation, it is 
possible that we have a unique representation for the first component, say with 
node D 2 , and the partial key becomes (/, ^2 • 04)- When a partial key is full 
(defined everywhere), then the node should be a unique representation. 

This new graphs have new equivalence relation, =pk which is implied by 
=tree- This new equivalence relation corresponds to =tree after the expansion of 
the partial keys into the graph. 

But now, with those partial keys, we can have a strongly connected graph 
such that, by root unfolding, one of its nodes is equivalent to a node in a partial 
key. Figure 4 shows a case of root unfolding, which can be as big as we want, 
even after cycle growth reduction^. So, we must look for such a node, even before 




Fig. 4. Root unfolding of a cycle 



applying the share algorithm. 

The name of the algorithm performing this task is shareWithDone(iV). It 
returns N if and only if no other node in the partial keys is equivalent to N. 
Otherwise, it returns the node in the partial keys that is equivalent to N. This 

^ In this figure, dotted lines correspond to nodes stored in partial keys. 
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algorithm uses some properties of the graph to reduce the complexity of the 
computation. Let G be the graph associated with N. As always in this section, 
we suppose that G is strongly connected. We call H the graph already computed 
and that is reachable from the partial keys of G. The algorithm determines 
whether a node of G is equivalent to a node of H. If it is the case, then there 
is root unfolding. If not, there is no root unfolding. We show that it is enough 
to verify this property for one node to treat the entire graph G because G is 
strongly connected. Suppose N is equivalent to M in H. Then, whatever the 
legible path p, N.p is equivalent to M.p. Because H has been treated already, 
any M.p is in H, and because G is strongly connected, any node of G is a N.p. 

There is a kind of reciprocal property that is exploited too: for some subsets 
of , if no node of the subset is equivalent to a particular node of G, then 
they are not equivalent to any node of G. A subset of is said to be closed 
if and only if, for every legible path p, for every node N in the subset, N.p is in 
the subset. 

Proposition 2. ViL' C such that H' is closed, if 3N G G^ such that 
MM G H' , N ^pk M, then this holds for every N G G^. 

Proof. Let H' be such a subset and N a node of G. If is not equivalent to 
any node in H' , then, suppose there is a M G G^ and a, O G H' such that M 
is equivalent to O. As G is strongly connected, there is a p such thatM.p = N . 
So, N would be equivalent to O.p, which is in H' . This proves that no element 
of G^ is equivalent to any element oi H' . □ 

Because of these properties, we can use the following algorithm for share- 
WithDone: we just compare every nodes of G with the nodes that are reachable 
from their partial keys and not already encountered. This comparison can be 
quite efficient by exploiting the fact that the nodes in the partial keys are unique 
representations of trees, although we have a quadratic worst case complexity. 

We will show in the next section, that by applying first shareWithDone, 
then share and then treeKey, we can indeed represent uniquely (and with the 
least possible number of nodes) any strongly connected graph, in an incremental 
process. 

4 The Best Representation for Infinite Trees 

4.1 Informal Presentation 

In order to show how we can produce the best representation for an infinite 
tree, we solve the following problem: considering a graph representing a tree t, 
return an equivalent graph with a minimal number of nodes. To achieve this 
in an incremental way, we use two dictionary mechanisms and a decomposition 
of the graph. First, we apply the classic algorithm, using the dictionary D, on 
the finite subtrees of the tree. When a finite subtree is entirely treated, it is 
incorporated in the graph through partial keys. Second, when there is no more 
finite subtree, there is a subtree represented by a strongly connected graph. The 




282 L. Mauborgne 



dictionary Dq stores the tree keys of such graphs, and after shareWithDone and 
if necessary, share, we can decide whether another equivalent graph has already 
been encountered, and if not, use new nodes. When the strongly connected graph 
is treated, it is considered as just a node, and so we can iterate on our algorithm 
until we give the representation of the root. 



4.2 The Algorithm 

We suppose given a dictionary D which maps full keys to nodes corresponding 
to a unique representation of the associated tree, and a dictionary Dq which 
maps tree keys (in fact keys of these finite trees) to nodes corresponding to a 
unique representation of the associated strongly connected graph. 

The algorithm uses local dictionaries too, which we assume to be empty when 
the process starts on a tree. The dictionary encountered contains the nodes of 
the original representation already encountered (so that we do not loop). The 
set returnNodes is used to detect the roots of the loops. 

A node is considered “treated” when it is in the dictionary D (and so it 
represents uniquely a tree). To decide whether a node is “treated”, we just have 
to look at its key: it is “treated” if the key is full. 

represent at ion(t) 

Step 1 if t G encountered then 

if encountered(f) is not treated add it in returnNodes 
return encountered(f) 

Step 2 A is a new node labeled by the empty partial key k of name 
the label of t 

Step 3 for each child ti of t do 

3a Ni <— representation(ti) 

3b if Ni is treated, then add it to k 
else N.i <— 

Step 4 if A: is full then 

ii k G D return D{k) 
else add fc— >■ A to I? and return A 
Step 5 remove A from returnNodes 

Step 6 if returnNodes = 0 then return representCycle(A) 

Step 7 return A 

representCycle(A) 

Step 1 if shareWithDone(A) A then return shareWithDone(A) 

Step 2 share(A) 

Step 3 if treeKey(A) € Dq then return DG'(treeKey(A)) 

Step 4 for each node M in the graph defined by A do 
4a add treeKey(M) — to Dq 

4b add the children of M to its partial key m 
4c add m^M to D 

Step 5 return A 
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4.3 Example 

We present the algorithm to represent regular trees on an example, the graph 
of Fig. 5 , where each node is assigned a number. We will write ti for the tree 



fi 




Fig. 5. Example 



represented by the node number i, 

representation(fi) calls representation for t2, is and te- The call to 
representation on t2 will return the node ^2. It will also store various nodes 
in D, and in particular (a) — >■ ^4. The call on will just return an untreated 
node Ds, with nothing added in the dictionaries. The call on Iq will recognize 
on step 4 that a is in I? and so it will return ^4. 

Thus, at step 5 , returnNodes = {Di} becomes empty, and we call re- 

(/,n2*D4) 



presentCycle with the graph^ ^ shareWithDone returns 

the node ^2. So the return value of representation on ti is ^2, the node 

’•Ch 

labeled by / in the graph 0 dictionaries will be: 

53 ^ 1 04 



D — {(o) D 4 ) ( 5 ) 03^2) D 3 ) (/; 0203^4) 02} 



Dg={ 



£ ( 5 ,**) “^'^2, (/,**D 4 ) £'^□3 






1 1 



2 Remember that (/, ^2 • ^ 4 ) is the partial key which is not defined on its second 
component. 




284 L. Mauborgne 



4.4 Proof of the Algorithm 

The algorithm returns the node of a graph. We must prove that this graph 
represents the same tree as the original graph, and that it is a graph of maximal 
sharing. 

First, notice that the algorithm terminates, because of the dictionary en- 
countered which implies that each node of the original graph is treated only 
once. 

The correctness of the algorithm is derived from the fact that we return the 
same graph as the original, except when we recognize that an equivalent node 
had already been encountered (through the node keys or the tree keys), in which 
case we replace one node by the other. It is the case step 4 of representation, 
and steps 1, 2 and 3 of representCycle 

The fact that the resulting graph has the minimal number of nodes lies 
in the use of the dictionaries D and Dq to ensure that we never duplicate 
any node. The dictionary D contains the node keys of every node encountered, 
and the dictionary Dq contains the tree key of every node of every strongly 
connected graph with minimal number of nodes we encounter. We can prove that 
each time we definitely introduce new nodes, there is no duplication. Definitive 
introduction is performed in two points: step 4 of representation, and step 4 
of representCycle. 

Step 4 of representation, we know that the key k is not in D. Moreover, 
each one of the Ni composing the key is unique because nodes in partial keys 

/ 

have already been treated. So if a tree / V had already been encountered, 

to tn-1 

the key (/, {Ni)i^n) would already have been encountered. 

Step 4 of representCycle, we know that the key treeKey(share(fV)) has 
never been encountered before. Because such a key is valid for strongly connec- 
ted graphs, it means that no other node M such that M =tree N have been 
encountered before. But the problem is that we have a partial key semantics on 
these graphs, and =treeC=pk, so we could have M ^tree ^ but M =pk N in 
effect representing the same tree. Because M ^tree N, there is a path p such 
that M.p and N.p do not have the same label, kM and k^- But as N and M 
represent the same tree, kM and kN must have the same name, so their only 
possible difference is in the partial function. It means there is an i such that 
one of the keys is defined on i and not the other key (if both of them were de- 
fined on i, their value would be the same on i, as the nodes in partial keys are 
unique representations). By construction, the nodes M and N are in strongly 
connected graphs. So if one of the keys is not defined on i, there is a g such that 

M. piq = M or N.piq = N. If t is the tree represented by both nodes, it means 
that = t. Suppose kM is defined on i, then there is a node reachable from 
kM{i) which represents the same tree as M, and as such it would have been 
found by shareWithDone. So the graph defined by M would never have gone 
beyond the step 1 of representCycle. It means that another representative is 
stored for the cycle (we go on like this until we find one which is equivalent to 

N, which means that the test step 3 could not have been false). If kN is defined 
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on i, by the same argument, we could not have been beyond the step 1, and so 
no new node is created. 

If no node equivalent to N has been encountered, it is the same for every other 
node M in the graph represented by N. It is due to the strong connectivity of 
the graph which implies that if M has already been encountered, N has already 
been encountered. 



5 Complexity Issues 

Algorithms on shared trees can be more difficult than standard algorithms on 
trees, because we must keep the uniqueness of the representation, and for effi- 
ciency, we must do it incrementally. Comparing complexities of algorithms on 
the two representations (the naive and the sharing ones) is difficult, though. The 
complexity is measured with respect to the size of the inputs of the algorithms, 
which can be reduced to the number of nodes of the inputs in our case. In the 
case of shared regular trees, the number of nodes is exactly the number of di- 
stinct subtrees of the tree, but when the tree is not shared, the number of nodes 
can be of any value greater than the number of distinct subtrees. In the sequel, 
we denote by n this number of nodes, but we must keep in mind that this n can 
be much bigger in the case of non-shared trees. 

The basic property of shared trees is the uniqueness of the representation. 
Thus, testing tree equality is really immediate: we just compare the memory 
location of the root. In the classic case, the best method uses a partitioning 
algorithm. Another case where we can avoid such a computation with shared 
trees is testing if a tree is a subtree of another one. In the shared case, we just 
have to compare the root of the first tree with all the nodes of the second one. 
Not only is it linear, but the second tree is very likely to have very less nodes in 
the shared case than in the classic representation. 

When building finite trees, we need only one operation, which we call root 

/ 

construction: we give a label / and the nodes and we build \ 

No ■■ N„-i 

Such an operation is constant time in the naive representation and in the sharing 
representation for finite trees (assuming hashing is constant time [12, 3]). It is 
indeed also constant time for infinite trees, but this operation does not suffice to 
build any regular tree. We need also some loop building mechanism. We call this 
second operation recursive construction. Considering a tree t and a label x, it 
consists in replacing every edge going to x by an edge to the root, and then apply 
representCycle to maintain the uniqueness of the representation. Concerning 
the complexity of this algorithm, it seems that the prevailing operation is the final 
(and unique) call to share, which is applied on the smallest possible subgraph, 
but in the worst case, the quadratic complexity of shareWithDone will take 
precedence. 

Many other operations can be adapted to shared trees while preserving the 
uniqueness of the representation by derivation from the representation algo- 
rithm. But due to lack of space, we let the reader write their own adaptations. 
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sharing representation 


naive representation 


testing ti = t 2 


0(1) 


O ((m -1- ri 2 ) log(m -1- ZZ 2 )) 


testing ti subtree of t 2 


0(ri2) 


O ((ni -1- ri 2 ) log(m -1- ZZ 2 )) 


building 


0{\P\) 


0{\P\) 


root construction 


0(1) 


0(1) 


recursive construction 


O(n^) 


0(n) 



Fig. 6. Summary of worst case time complexities 



The summary suggests that if we are to perform equality testing, it can be 
beneficial to perform sharing during the calculus. What we show here are worst 
case complexity, though, and the difficult cases are quite pathological, and thanks 
to some simple optimizations, they are quite rare. The situation is very similar 
to the complexity of operations on BDDs [2] compared to the operations on 
boolean formulas. The size of the formula representing a given boolean function 
is unbounded, but the basic operations, like conjunctions, are linear in the size 
of one of the formulas whereas they are quadratic for the BDDs. Nevertheless, 
in practice BDDs are far more efficient. 



6 Application: Set-Based Analysis 

We propose to use these techniques to improve the representations of sets of trees. 
The expressive power of this improved representation is exactly what is needed 
in set-based analysis [9], where sets of trees are approximated by ignoring the 
dependencies between variables (an idea which was already present in [16, 11]). 



6.1 Tree Automata and Graphs 

Because the cartesian approximation eliminates any dependencies between child- 
ren of a tree, we can use deterministic top-down tree automata in set-based ana- 
lysis. The idea we use here is that deterministic top-down tree automata can be 
seen as graphs, where the only properties that matter are path properties, and 
so it can be represented efficiently as a regular infinite tree. 

A deterministic top-down tree automaton [17, 8] is a tuple {Q, 1, 6, F) where 
Q is a finite set of states, I G Q is the initial state, F c Q is a set of final 
states, and 6 :AxQ^Qx...xQ is the transition function which takes a 
label in A and a state, and returns a sequence of states (as many as the arity of 
the label). The corresponding graph G is such that = Q, G^ = {{q,q',ai) \ 
S{a,q) = {...,q',...) and q' in z**' position }. This connection means that we 
can represent the sets used in set-based analysis without any variable name in 
the representation, and in a shared way. 
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6.2 Tree Skeletons 

In order to represent the sets of set-based analysis as trees, we use a new label 
to represent the anonymous states of the tree automata. This label, which we 
call a choice label corresponds to a possible union in the interpretation of the 
infinite tree. We denote this label Q- We call the infinite trees with this extra 
label a tree skeleton. The set of trees represented by a tree skeleton is defined^ 
by: 




In order to have a unique representation of the sets of trees (and so keep 
the constant time equality testing and memoizing properties), we make some 
restrictions on what infinite trees are considered valid tree skeletons. First we 
eliminate unnecessary choices: if a choice node has only one child, it is replaced 
by its child. If a choice node is the child of a choice node, it is replaced by its 
children. We perform the cartesian approximation: if two children of a choice 
node have the same label, they are merged (replaced by their cartesian upper 
approximation). Finally, the children of a choice node are ordered according to 
their labels. See the summary of figure 7. 



o 



■ 4 ^ 




/i\ 

to C) tn 



o 



to i'-n 



all ti(e) are in strict order. 



Fig. 7. Rules to obtain a valid tree skeleton 



Any deterministic top-down tree automaton can be represented by a valid 
tree skeleton. Consider an automaton (Q, /, 5, F). We first build the infinite tree 
labeled by Q and A, such that the root is labeled by I, the children of a given 

® Set is defined as the least fixpoint of this set of equations. The ordering is the 
pointwise ordering of the inclusion of the images. If we wanted to include infinite 
trees (as in [5]), we would take the greatest fixpoint. 
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state q are the different a such that 6{q, a) is defined, an the children of such a a 
are the S{q, a). This tree is regular because there is at most one subtree labeled 
by a given q € Q, and at most \Q\ subtrees labeled by a given a £ A. The second 
step consists in removing every label of arity 0 which does not come from a state 
in F, and in replacing every state by Q- Then we derive the valid tree skeleton. 

6.3 Using Tree Skeletons in Analysis 

Manipulation of tree skeletons uses basic algorithms on shared infinite regular 
trees. Once we can keep the maximal sharing property, it is easy to keep track of 
the other rules for tree skeletons. Then tree skeletons can be used everywhere we 
consider a set of trees in the analysis. It can replace some of the tree automata 
of [7] (if we keep the original restrictions of set-based analysis), or the tree 
grammars of [13], as the approximation on union corresponds indeed to cartesian 
approximation. 

In practice, you can try to use the toolbox under development at the following 
address: http : / /www . di . ens . f r/~mauborgn/skeleton . tar . gz. 

7 Conclusion 

While trying to improve the representation of sets of trees in set-based analysis, 
we presented generic algorithms to manipulate efficiently any structure encoded 
as infinite regular trees. These algorithms allow a very compact representation 
of such structures and a constant time equality testing. One of their advantages 
is their incrementality which allows their use on dynamic structures. The com- 
plexity analysis cannot describe the potential benefit of this new representation, 
but it suggests the same gain as for Binary Decision Diagrams which use similar 
techniques. 

We also described a new way of representing sets of trees using infinite regular 
trees. This new representation is sharing, incremental and unique. Current work 
includes the integration of the representation in an actual analyzer to show 
experimentally its benefits. 
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Abstract. We prove the correctness of the translation of a prototypic 
While-language with nested, parameterless procedures to an abstract as- 
sembler language with finite stacks. A variant of the well-known wp and 
wip predicate transformers, the weakest relative precondition transformer 
wrp, together with a symbolic approach for describing semantics of as- 
sembler code allows us to explore assembler programs in a manageable 
way and to ban finiteness from the scene early. 

Keywords: compiler, correctness, refinement, resource-limitation, pre- 
dicate transformer, procedure, verification 

1 Introduction 

The construction of compilers is one of the oldest and best studied topics in 
computer science and neither the interest in this subject nor its importance 
has declined. Though the range of application of compiler technology has grown, 
there is still a great need for further understanding the classical setup of program 
translation. Even if we trust a source program or prove it correct, we cannot rely 
on the executed object code, if compilation may be erroneous. This motivates 
us to study the question of how to construct verified compilers. 

Trusted compilers would permit to certify safety-critical code on the source 
code level, which promises to be less time-consuming, cheaper, and more reliable 
than the current practice to inspect the generated machine code [7,13]. The ulti- 
mate goal of compiler verification [1,2,4,6,8,9,11,12] is to justify such confidence 
into compilers. 

In [10] we studied the question what semantic relationship we can expect 
to hold between a target program and the source program from which is was 
generated. Two natural candidate properties from the point of view of program 
verification are preservation of total correctness (PTC) and preservation of par- 
tial correctness (PPC). They require that all total or partial correctness asserti- 
ons valid for the source program remain valid for the target program. Another 
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characterization is as refinement of the wp and wip transformers [3] associated to 
the source and target program. We argued, however, that neither PTC nor PPG 
is guaranteed by practical compilers. Limited resources on the target processor 
prohibit the former: PTC implies that the target program terminates regularly, 
i.e. without a run-time error, whenever regular termination is guaranteed for 
the source program. But when we implement a source language with full recur- 
sion on a finite machine, "StackOverflow" errors will be observed every now and 
then. On the other hand, optimizing compilers generally do not preserve partial 
correctness because common transformations, like dead-code elimination, may 
eliminate code from the program that causes a run-time error. Thus, run-time 
errors may be replaced by arbitrary results. 

As a remedy we proposed in [10] the more general notion of preservation of 
relative correctness (PRC) (recalled in Sect. 3). Relative correctness is parame- 
terized in a set A of accepted failures and allows thus - in contrast to partial 
or total correctness - to treat runtime errors and divergence differently. We also 
studied a corresponding family of predicate transformers wrp^^. It is conveni- 
ent to refer to predicate transformer (PT) semantics in compiler proofs because 
there is a powerful data refinement theory for PTs and refinement proofs can be 
presented in a calculational style by using algebraic laws [5,6,9]. PTs also inter- 
face directly to correctness proofs for source programs, wrp is meant to permit 
an elegant treatment of runtime errors and finiteness of machines while staying 
in the familiar and well-studied realm of predicates and predicate transformers. 

The main purpose of the current paper is to show that wrp keeps this promise. 
More specifically, we employ wrp-based reasoning to prove correct the transla- 
tion of a prototypic While-language with nested, parameterless procedures to 
an abstract assembler language with finite stacks, a proof that is also of inde- 
pendent interest. We focus on the control flow implementation by jumps and a 
return address stack. Due to finiteness of stacks, regular termination of target 
programs generated from terminating source programs cannot be guaranteed. 
Nevertheless, wrp allows to establish a variant of PTC in which "StackOverflow" 
is treated as an accepted failure. As intended, finiteness of stacks vanishes from 
the scene very early: by taking into account that "StackOverflow” is an accepted 
error, the laws about wrp derived from the operational semantics are akin to 
the ones of an idealized assembler with unbounded stacks. Thus, wrp allows to 
reason about implementations on finite machines without burdening the verifi- 
cation. Another interesting aspect of our proof is that we employ symbolic ways 
of reasoning about assembler language semantics instead of referring to more 
conventional descriptions by means of an instruction pointer. 

The remainder of this paper is organized as follows. Sect. 2 recalls the basics 
of predicates and predicate transformers. Preservation of relative correctness is 
discussed briefly in Sect. 3. The abstract assembler language which will serve as 
the target language is presented in Sect. 4 before the source language, a more 
common high-level language, is introduced in Sect. 5. The translation scheme is 
defined in Sect. 6 and the actual correctness proof is given in Sections 7 and 8. 
We conclude with some remarks in Sect. 9. 
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2 Preliminaries 

Predicates. Assume given a set S of states s; typically a state is a mapping 
from variables to values. We identify predicates with the set of satisfying states, 
so predicates are of type Pred = 2^ ranged over by </> and xp. Pred, ordered by 
set inclusion, forms a complete Boolean lattice with top-element true = S and 
bottom-element false = 0. 

Predicate transformers. A predicate transformer (PT) is a mapping / : Pred — >■ 
Pred. Sequential composition of two predicate transformers / and g is defined by 
(/; 5)(</') = /(ff(0)) and, hence, is associative and has the identity Id, Id{tp) = ip, 
as unit. We restrict the set of PTs to the monotonic ones because this makes 
sequential composition monotonic. PTrans = {Pred — > Pred) together with 
the lifted order < defined by f < g V</> € Pred : f{<p) C g{cp) for 

f,g € PTrans, is also a complete lattice with top-element T, T{tp) = true, and 
bottom-element _L, -L(i/;) = false. 

Fixpoints in complete lattices. The famous theorem of Knaster and Tarski en- 
sures that every monotonic function / on a complete lattice {L, <) has a least 
fixpoint gf and a greatest fixpoint irf. A well-known means for proving pro- 
perties concerning fixpoints is the following. 

Theorem 1 (Fixpoint induction). For P C L one has gf G P provided that: 

1. VC CP: C is totally ordered : \/ C G P. (Admissibility) 

2. J- G P. (Base Case) 

3. \/x G P : X < f{x) => f{x) G P. (Induction Step) 

3 Relativized Predicate Transformers 

In this section we recall relative correctness and relativized predicate transfor- 
mers, which were introduced and discussed at length in [10], focusing on what’s 
important for our purposes. 

We consider imperative programs tt intended to compute on a certain non- 
empty set of states S. For the moment, the details of program execution are not 
of interest; we are only interested in the final outcomes of computations. We thus 
assume that each program tt is furnished with a relation R{tt) C S x {S U 12), 
where 12 is a non-empty set of failure (or irregular) outcomes^ disjoint from E. 
Typically, 12 contains error states like "DivByZero" and "StackOverflow" and a 
special symbol oo representing divergence. 

We use the following conventions for the naming of variables: E is ranged 
over by s, f2 by to, and 27 U 12 by cr. Intuitively, (s, s') G 2?(7t) records that s' is 
a possible regular result of tt from initial state s, (s,o) G R{t^) means that error 
state o G f2 \ {oo} can be reached from s, and (s, oo) G R{tt) that tt may diverge 
from s, i.e., run forever. R{tt) can be thought to be derived from an operational 
or denotational semantics. An example is discussed in Sect. 4. 

^ We use the more neutral word ‘outcome’ instead of ‘result’ because some people 
object to the idea that divergence is a result of a program. 
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Relative correctness. When evaluating partial correctness assertions all irregular 
outcomes of programs are accepted; in contrast in total correctness assertions 
all irregular outcomes are taken as disproof. Relative correctness is built around 
the idea of parameterizing assertions w.r.t. the set of accepted outcomes. The 
irregular outcomes that are not accepted are taken as disproof. 

Assume given a set A C 17 of accepted outcomes; this set may contain di- 
vergence as in partial correctness. For a given precondition (j) and postcondition 
ip we call program tt relatively correct w.r.t. p, ip and A if each 7r-computation 
starting in a state satisfying (p terminates regularly in a state satisfying ip or has 
an accepted outcome in A (e.g. tt may diverge if oo G A). More formally: 

{<P)t:{iP)a iff Vs, (T : s G (/) A (s, cr) G i?(7r) a&ipVjA . 

The classical notions of partial and total correctness are special cases: partial 
correctness amounts to {P)tt{iP)q and total correctness to (</>)7r(V')0. 

Weakest relative preconditions. Relative correctness gives rise to a corresponding 
predicate transformer semantics of programs. The weakest relative precondition 
of 7T w.r.t. Ip and A is the set of regular states from which all 7r-computations 
either terminate regularly in a state satisfying ip or have an outcome in A: 

wrpj^{Tr){ip) = {s G A I Vct : (s,a) € R{tt) => aGipUA} . 

Note that wrpy^(Tr) G PTrans. Dijkstra’s wip and wp transformers [3] are just 
the border cases of wrp: wrp^ = wIp and wrp0 = wp. There is a fundamental 
difference between wip and wp regarding the fixpoint definition of repetitive 
and recursive construct which generalizes as follows to the wrp^ transformers: if 
oo G A we must refer to greatest fixpoints, otherwise to least ones. 

The following equivalence generalizes the well-known characterization of par- 
tial and total correctness in terms of wip and wp: 

pCwrp^{n){ip) {4>)tt{iP)a ■ 

Preserving relative correctness. A natural way to approach translation correc- 
tness is to focus on properties that transfer from source to target programs. 
Suppose, for instance, that tt is a source program and tt' is its translation. We 
say that the translation preserves relative correctness w.r.t. A if 

yp,ip: {P)tt{iP)a ^ , ( 1 ) 

i.e., if all relative correctness assertions transfer from tt to tt' . It is straightforward 
to show that (1) is equivalent to the refinement inequality wrp^(7r) < wrpy^(Tr'). 
Refinement between predicate transformers can be established by algebraic cal- 
culations. We can thus take advantage from such algebraic calculations in seman- 
tic compiler proofs. The remaining part of this section is devoted to providing 
suitable notations that enable this in the scenario studied in this paper. 
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Concrete predicate transformers. Suppose given three basic sets of syntactic 
objects: a set Var of variables x, a set Expr of expressions e, and a set BExpr of 
Boolean expressions b. We assume interpretation functions for expressions and 
Boolean expressions £l(e) :£'—>■( ValCQ) and B{b) : S — >• (BU17); here Val is the 
value set of variables and the set B = {tt,ff} represents the truth values. For the 
remainder of this paper, states are valuations of variables, i.e. E = ( Var — >■ Val). 
Intuitively, results £{e){s),B{b){s) G 17 represent failures (including divergence) 
arising during evaluation of (Boolean) expressions. 

In order to deal with partially defined expressions we assume special types of 

basic predicates: def(e) {s | f(e)(s) G FoZ} and iriA(e) {s | f(e)(s) G A} 
for expressions e and A C 17. Analogously, we have the predicates def(6), in^(6) 

def 

and also 6 = tt = {s | B{b){s) = tt} and 6 = ff for Boolean expressions b. 

We consider an assignment x := e. The expression e is evaluated in some 
given state, and if evaluation delivers a regular result it is assigned to x. But 
evaluation of e might also fail with an outcome oj. It depends on whether ui £ A 
or not if we consider this acceptable. Hence, the weakest relative precondition 
of this assignment w.r.t. A and postcondition if is given by 

{x:=Ae){ip) =*' in^(e) U (def(e) n V-le/a:]) . 

Another example is a conditional with branches P and Q guarded by b, 
where the PTs P and Q are wrp^-transformers. Obviously the weakest relative 
precondition w.r.t. A and if of this construct is P{ip) resp. Q{if) if b evaluates to 
tt resp. ff. Since evaluation of b can also fail, the weakest relative precondition 
PT w.r.t. A and postcondition if is given by 

{P <h/A\>Q){if) in^(6)U(6 = ttnP(V'))U(6 = ffng(V’)) . 

4 An Abstract Assembler Language 

Syntax. The language defined in this section is intended to capture the essence 
of flat, unstructured assembler code. In this, our main interest is a realistic tre- 
atment of control structures. Therefore, labels I taken from a set Lab are used to 
mark the destination of jump instructions as common in assembler languages. 
In order to keeps things manageable, the language works on a state space with 
named variables and we provide instructions embodying entire (Boolean) expres- 
sions: asg{x,e) and c){b,l). Such instructions should be thought to be ‘macros’ 
representing a sequence of more concrete assembler instructions. A language of 
this kind might be used as a stepping stone on the way down to actual binary 
machine code. 

The set Instr consists of instructions of the following form. 

— asg(x,e): an assignment instruction, 

— cj(6, 1): a conditional jump (on false) to label I, 

— jsr(Z): a subroutine jump to label I, and 

— ret: a return jump. 
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We write goto(/) for cj( false,/). It represents an unconditional jump. 

An assembler ( or machine ) program m is a finite sequence consisting of in- 
structions and labels where we assume unique labeling, formally 

def 

m £ MP = {to € {InstrU Lab)* \ \/i,j : rrii = mj £ Lab i = j} ■ 

Concatenation of programs is denoted by an infix dot . A program to is called 
closed if every label that has an applied occurrence in to also has a defining 
occurrence. The set of closed programs is denoted by CMP. Here is an example 
of a closed program computing the factorial of x leaving the result in variable y: 

asg(y, 1) • Loop • cj(x yf 0, End) • asg(y,x * y) • asg(x,x — 1) • goto(Loop) • End 

Basic operational semantics. A processor executing a machine program will 
typically use an instruction pointer that points to the next instruction to be 
executed at any given moment. For reasoning about assembler code, however, 
it is more convenient to represent the current control point in a more symbolic 
manner: we partition the executed program to into two parts u, v such that 
m = u-v and that the next instruction to be executed is just the first instruction 
of V. Progress of execution can nicely be expressed by partitioning the same 
code sequence differently. PMP (partitioned machine programs) denotes the set 
of pairs (u, v) such that u - v £ CMP. 

Similarly, we prefer to work with a symbolic representation of the stack of 
return addresses; such a stack is necessary to execute jump-subroutine and return 
instructions. The idea is to use a stack of partitioned code sequences (modeled 
by a member of PMP*) instead of a stack of addresses. 

The basic semantics of the abstract assembler language is an operational 
semantics built around the ideas just described. It works on configurations of 
the form {u, v, a, s), where {u, v) £ PMP models the current control point {u ■ v 
is the executed program) , a £ PMP* is the symbolic representation of the return 
stack, and s G A is the current state. Thus, 

r {(u, V, a, s) I {u, v) £ PMP A a £ PMP* A s £ S} 

is the set of regular configurations. In order to treat error situations, we use the 
members of f2 as irregular configurations. Table 1 defines the transition relation 
— >■ C T X (T U 1?) of an abstract machine executing assembler programs. 

Let us consider the rules in more detail. [Asgl] applies if expression e evalua- 
tes without error to a value in the current state s: the machine changes the value 
of X accordingly - the new state is s[x >->■ £i(e)(s)] - and transfers control to the 
subsequent instruction by moving asg(a;, e) to the end of the M-component. [Asg2] 
is used if evaluation of e fails in the current state: the failure value £{e){s) is just 
propagated. [Cjl] describes that a conditional jump cj(6, /) is not taken if b eva- 
luates to tt in the current state: control is simply transferred to the subsequent 
instruction. If b evaluates to ff, rule [Cj2] applies and the control is transferred to 
label /, the position of which is determined by the premise u ■ c]{b,l) ■ v = x ■ I ■ y . 
[CJ3] propagates errors resulting from evaluation of b. [Jsrl] is concerned with a 
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Table 1. Operational semantics of the assembler langnage 



[Asgl] 



S{e){s) e S 

(u, asg(a;, e) • v, a, s) ^ (u ■ asg(x, e), v, a, s[x !->• f (e)(s)]) 



[Asg2] 



£(e)(s) € 1 ? 

(u, asg(x, e) ■ V, a, s) — >■ £{e){s) 



[Cjl] 



B{b){s) = tt 

{u, cj(6, 1) ■ V, a, s) — >• (u • cj(fe, 1), V, a, s) 



B{b){s) = ff , M ■ cj(b, l)-v = x-l-y 
{u, cj{b, 1) ■ V, a, s) {x, I ■ y, a, s) 



[Cj3] 

[Jsrl] 

[Jsr2] 



B{b){s) £ n 

{u, cj(6, 1) ■ V, a, s) — >• B{b){s) 

u ■ jsr(l) ■ V = x ■ I ■ y 

(m, jsr(Z) • V, a, s) ^ (x,l ■ y,a ■ (u ■ jsr(Z), v), s) 
(M,jsr(Z) • v,a,s) — >■ “StackOverflow” 



[Retl] (u, ret ■ v,a ■ (x, y), s) — >■ (x, y, a, s) 
[Ret2] (m, ret • n, e, s) — >■ "EmptyStack” 
[Label] (u, I ■ v,a, s) ^ (u- 1, v, a, s) 



subroutine jump to label 1. Similarly to rule [CJ2], control is transferred to label 
1. Additionally, the machine stores the return address by pushing {u ■ jsr(l),v) 
onto the symbolically modeled return stack a. If execution subsequently reaches 
a ret instruction, execution of (u-jsr(/),w) is resumed as specified by [Retl]. A 
processor with finite memory will not always be able to stack a return address 
when executing ajsr instruction. We model this by rule [Jsr2] that allows the ma- 
chine to report "StackOverflow" spontaneously. Of course, in an actual processor 
the choice between regular stacking and overflow will be mutually exclusive and 
not just non-deterministic as in our model. This could be modeled by furnishing 
[Jsr2] by a premise StackFull and [Jsrl] by a premise -iStackFull, where StackFull 
is a (complicated) condition depending on the current state of the machine. Fi- 
nally, [Ret2] reports an error if a ret instruction is executed on an empty return 
stack, and [Label] allows to skip labels. 

The evaluation of m in state s starts in the initial configuration (e, m, e, s), i.e. 
with the first instruction of m and with an empty stack. Execution terminates 
regularly if a configuration of the form (m, e, a, s') is reached; other possible 
outcomes are reachable error configurations ui, and oo, if there is an infinite 
sequence of transitions from (e, m,e, s). Based on this intuition, we could now 
define a relational semantics R{m) for a given program m G CMP following 
the lines of the definition below. R(m) would give rise to a family of predicate 
transformers wrp^(m). Up to this point wrp^(m) would be known only with 
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reference to the operational semantics. In order to allow a reasoning on a more 
abstract level we would like to derive sufficiently strong laws about wrp^(m) 
from the operational semantics first; afterwards we would use just these laws in 
our reasoning without referring directly to the operational semantics. 

Unfortunately, this approach fails for wrp^(m): only very weak laws can be 
established. The main problem is that the behavior of jump and jump-subroutine 
instructions cannot adequately be described without having context information 
available. We, therefore, work with a semantics of machine programs that takes 
the sequential context as well as the stack context into account. 

For {u, v) G PMP and a € PMP* we define 

R{u,v,a) = {{s,s') \ 3u',a' : (u,v,a,s) {u',e,a',s')} 

U {(s,w) I {u,v,a,s) — >■* uj} 

U {(s,oo) I (u,v,a,s) , 

where — >■* denotes the reflexive and transitive closure of — >■, and — the exi- 
stence of an infinite path. This definition induces a family of predicate transfor- 
mers wrp^(u, V, a) and it is this family that we are using in our reasoning. We can 
define i?(m) and wrp^(m) by R{m) = R{e,m,s) and wrp^(m) = wrp^(£, m, e). 

The laws in Table 2 can now be proved from the operational semantics. 
Technically these laws are just derived properties but they can also be read as 
axioms about the total behavior of a machine. Law [Asg-wrp], e.g., tells us about 
a machine started in a situation where it executes an asg-instruction first: its total 
behavior can safely be assumed to be composed of the respective assignment to x 
and the total behavior of a machine started just after the assignment instruction. 
The other laws have a similar interpretation. Together the laws allow a kind 
of symbolic execution of assembler programs. But we do not have to refer to 
low-level concepts like execution sequences; instead we can use more abstract 
properties, e.g., that > is an ordering. 

All these laws can be strengthened to equalities. We state them as inequalities 
in order to stress that just one direction is needed in the following. Refinement 
allows to use safe approximations on the right hand side instead of fully accurate 
descriptions. This allows to reason safely about instructions whose effect is either 
difficult to capture or not fully specified by the manufacturer of the processor 
[9]. If, for example, [Jsrl] and [Jsr2] are furnished with a condition StackFull 
as discussed above, the refinement inequality stated in [Jsr-wrp] becomes pro- 
per, because jsr would definitely lead to the acceptable error "StackOverflow" 
if StackFull holds. Therefore, the PT on the left hand side would succeed for 
all states satisfying StackFull irrespective of the post-condition, while the right 
hand side may fail. 

Note that the premise "StackOverflow” G A of the law [Jsr-wrp] is essential. 
If "StackOverflow" is considered unacceptable ("StackOverflow" ^ A), we have 
wrp^(u,jsr(^) • V, a) = T as a consequence of [Jsr2j. This means that jsr cannot 
be used to implement any non-trivial statement. If the more precise operational 
model with a StackFull predicate is used, wrp^(w, jsr(l) -r;, a) is better than T but 
any non-trivial approximation will involve the StackFull predicate. This would 
force us to keep track of the storage requirements when we head for a verified 




298 



M. Miiller-Olm and A. Wolf 



Table 2. wrp-laws for the assembler language 



[Asg-wrp] 


wrp^(u, asg(a;,e) • v,a) 


> {x :=A e) ; wrp^(ti • asg(a;, e), v, a) 


[Cj-wrp] 


wrp^(M,cj(fc,0 • v,a) 


> wrp^(u • cj(fe, 1), V, a) < b/A > wrp^(a;, 1 ■ y,a) , 
a u ■ cj{b, 1) ■ V = X ■ 1 ■ y 


[Goto-wrp] 


wrp^(M,goto(Z) • v,a) 


> wrp^{x,l- y,a) , 
if u ■ goto(Z) ■ V = x ■ 1 ■ y 


[Jsr-wrp] 


wrp^(M,jsr(0 • v,a) 


> wrp^{x, l-y, a - {u-jsr{l),v)) , 
if u ■ jsr(Z) ■ V = x ■ 1 ■ y and “StackOverflow” G A 


[Ret-wrp] 


wrp^(u, ret -v,a- (x,y)) 


> wrp^{x,y,a) 


[Label-wrp] 


wrp^(«, 1 ■ v,a) 


> wrp^(u • 1, V, a) 


[Term-wrp] 


wrp^ («,£,«) 


> Id 



compilation. As the recursion depth of programs is in general not computable, 
we could not justify translation of arbitrary recursive procedures. 



5 A Simple High-Level Language 

As a prototypic instance of a high-level language we consider a While-language 
with parameterless, nested procedures. Such a language is adequate for studying 
the control-flow aspects of translation of ALGOL-like programming languages. 

Syntax. We define the set of programs. Prog, by the following grammar. In 
order to distinguish programs clearly from the corresponding semantic predicate 
transformers from Sect. 3 we use an abstract kind of syntax. 

7T ::= assign(x,e) | seq(7ri,7r2) | if(&, tti, 7T2) | while(6, tt) | call(p) | blk(p, tt^, tt^) 

In this grammar, x ranges over the variables in Vor, b and e over BExpr and 
Expr, and p over a set ProcName of procedure identifiers. 

blk(p, TTp, 7Tb) is a block in which a (possibly recursive) local procedure p with 
body 7Tp is declared, tt^ is the body of the block; it might call p as well as more 
globally defined procedures. The semantics below ensures static scoping and so 
the translation of the next section has to guarantee static scoping as well. Note 
that nesting of procedure declarations and even re-declaration is allowed. Our 
exposition generalizes straightforwardly to blocks in which a system of mutually 
recursive procedures can be declared instead of just a single procedure. We re- 
frained from treating this more general case only, as it burdens the notation a 
bit without bringing more insight. The intuitive semantics of the other syntactic 
operators should be clear from their name. 

Semantics. Now we furnish the While-language with a predicate transformer 
semantics. Due to lack of space, we cannot follow the lines from the last section; 




On the Translation of Procednres to Finite Machines 



299 



instead we postulate the resulting predicate transformer semantics directly. Ne- 
vertheless the oncoming definitions should be read as laws derived from a more 
concrete semantics. In [10] we justified such definitions briefly for a language 
without procedures. 

In order to give a compositional semantics, we refer as usual to environments 
rj G Env {ProcName — >■ PTrans), mapping procedure identifiers to the wea- 

kest relative precondition transformer of their body. The environment is taken 
by wrp as an additional argument written as a superscript. 



wrp]^ (assign (x,e)) 
wrp]^(seq(7ri,7r2)) 
wrp]^(if(6,7ri,7r2)) 
wrp]4(while(6, tt)) 
wrp]\(call(p)) 

wrp]^(blk(p,7rp,7rb)) 



{x :=A e) 

wrp]^(7ri) ; wrp\{TT2) 

wrp]^(7ri) <1 b/A l> wrp]^(7r2) 



AW 

v{p) 



wrpf-^^l(7T,) 



In the clauses for while and bik, \ = v \i oo & A, and X = p, otherwise, i.e. 
we have to take the greatest fixpoint if divergence is accepted (like in partial 
correctness semantics) and the least fixpoint otherwise (see [10]). 

Let us discuss briefly each of the clauses in turn. The assignment law ta- 
kes advantage from the assignment combinator defined in Sect. 3. The wea- 
kest precondition of a sequential composition is the weakest precondition of 
the first statement establishing the weakest precondition of the second state- 
ment. A conditional’s weakest precondition depends on the validity of the guard. 
Operationally a loop is unrolled as long as the guard holds, hence the weakest 
precondition PT of a loop is a fixpoint of the well known semantical function 
W : PTrans — >■ PTrans, where W{X) = (tt; X) <\ b/A [> Id. Application of the 
environment in question captures the call-case. A block’s weakest precondition 
in some given environment is the weakest precondition of the body in a varied 
environment that contains a new binding for the local procedure declared in that 
block. The weakest precondition of that procedure is a fixpoint of the function 
V : PTrans — >■ PTrans, where P(A) = wrp/^^^^TTp). 

Complete programs are interpreted in the environment that bind all 

procedures to the T predicate transformer, because otherwise the call of an 
undeclared procedure would miraculously have a non-trivial meaning. Hence, 
when comparing a complete program tt to its translation, we refer to (tt). 



6 Specification of Compilation 



In Table 3 we inductively define a compiling relation C C Prog x MP x Diet. 

Here Diet = {ProcName ^ Lab) is the set of dictionaries that intuitively map 
procedure names to labels where code for the corresponding body can be fo- 
und. We have C{'K,m,S) if machine program m is a possible compiling result of 
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Table 3. Compiling relation 



[Assign] C(assign(a;, e), asg(a:, e), (5) 

[Seq] ^(^i> C{'K2,m2,5) 

C(seq(7Ti, 7T2), mi • m 2 , 5) 



[If] 



C(7Tl,mi,^), C(7T2,m2,^) 

C(if(6, TTl, 7T2),cj(6, h) ■ mi • goto(l2) • /l • m 2 • l2,&) 



[While] 



C(7T, m, 5) 

C(while(&, 7 t), lo • cj(6, h) ■ m ■ goto(/o) • h,5) 



p £ dom(^) 

^ ‘ C(call(p),jsr(5(p)),5) 



[Blk] 



C(7Tp,mp, S[p !->■ Ip]), C{TTb,mb,S[p i-» Ip]) 
C(blk(p, TTp, TVb), goto(lb) ■ Ip-mp- ret • h ■ mb, 5) 



source program tt assuming that dictionary S assigns appropriate labels to free 
procedure names. The program 

seq (assign (y, 1), while(x > 0, seq (assign (y,x * y), assign (x, x — 1)))) , 

for instance, may be compiled to the assembler program computing the factorial 
function in Sect. 4 irrespective of the dictionary 6. 

Note that the typing constraint m G MP guarantees that target programs 
are labeled uniquely. An advantage of a relational specification over a functional 
compiling-function is that certain aspects, like choice of labels here, can be left 
open for a later design stage of the compiler. 

7 Correctness of Compilation 

This section is concerned with proving correctness of the translation specified 
in the previous section. As discussed in the introduction, the translation can- 
not be correct in the sense of preservation of total correctness (PTC), as our 
assembler language might report "StackOverflow" on executing a jsr instruction 
and thus regularly terminating source programs might be compiled to target 
programs that do not terminate regularly. Nevertheless source programs that do 
not diverge are never compiled to diverging target programs. But PTC identi- 
fies divergence and runtime-errors and, therefore, it cannot treat this scenario 
appropriately. A main purpose of this paper is to show how the greater selec- 
tivity of wrp^-based reasoning allows a more adequate treatment by appropriate 
choice of A. We treat "StackOverflow” as an acceptable outcome but 00 as an 
unacceptable one. This gives rise to a relativized version of PTC. We comment 
on the proof for relativized versions of PPC in the conclusion. 
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Theorem 2. Suppose oo ^ A and "StackOverflow” G A. Then for all ir, m: 

wrp^(m) > wrpj^®"”(7r) . 

Thus, if a program tt is compiled to an assembler program m in an empty 
dictionary, relative correctness is preserved. Note that the premise of the com- 
piling rule [Call] guarantees, that non-closed programs cannot be compiled with 
an empty dictionary. 

When we try to prove Theorem 2 by a structural induction we encounter 
two problems. Firstly, when machine programs are put together to implement 
composed programs, like in the [Seq] or [If] rule, the induction hypothesis cannot 
directly be applied because it is concerned with code for the components in 
isolation while, in the composed code, the code runs in the context of other 
code. Our approach to deal with this problem is to establish a stronger claim 
that involves a universal quantification over all contexts. More specifically, we 
show wrpy^(u,TO ■ v,a) > wrp]^(7r) ; wrpj^{u ■ m,v,a) for all surrounding code 
sequences u, v and stack contexts a. Note how the sequential composition with 
wrp^(u • m,v,a) on the right hand side beautifully expresses that m transfers 
control to the subsequent code and that the stack is left unchanged. 

Secondly, when considering the call-case, some knowledge about the bindings 
in the dictionary S is needed. To solve this problem we use the following predicate. 

fit(? 7 , i5, u) <4^ Vg G dom(5) : 3x, j/ : 

X ■ S{q) ■ y = u A 

ye,f,g: wrp^(x, <5(g) • y, g • (e, /)) > r?(g) ; wrp^(e, /, g) . 

It expresses that the bindings in S together with the assembler code u ‘fit’ to the 
bindings in the semantic environment rj. The first conjunct says that the context 
provides a corresponding label for each procedure q bound by 6; the second 
conjunct tells us that the code following this label implements q’s binding in 
7] and proceeds with the code on top of the return stack. This is just what is 
needed in the call-case of the induction. The code generated for blocks has to 
ensure that this property remains valid for newly declared procedures. 

Putting the pieces together we are going to prove the following. 

Lemma 3. Suppose oo ^ A and "StackOverflow” G A. For all TT,m,u,v,a,r],S: 

C{7T,m,S) Af\t{r],S,u-m-v) wrp^(u, m • u, a) > wrp]^(7r) ; wrp^^(u • m, w, a) . 

Theorem 2 follows by the instantiation u = v = e, a = e,r] = Aeuv, <5 = 0 
using the [Term-wrp] law and the fact that wrp^(m) = wrp^(£,m,e). 

8 Proof of Lemma 3 

The proof is by structural induction on tt. So consider some arbitrarily chosen 
7T, m, M, V, a, r], 6 such that C(7 t, to, 5) and fit(r 7 , S,u ■ m ■ v), and assume that for 
all component programs the claim of Lemma 3 holds. As usual, we proceed by a 
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case analysis on the structure of tt. In each case we perform a kind of ‘symbolic 
execution’ of the corresponding assembler code using the wrp-laws from Sect. 4. 
The assumptions about fit will solve the call-case elegantly, the while- and blk- 
case moreover involve some fixpoint reasoning. 

Due to lack of space we can discuss here only the cases concerned with 
procedures: call and bik. 

Case a.) tt = call(p). By the [Call] rule, m = }sr{6{p)) and p £ dom(S). As 
a consequence of fit(? 7 , S,u ■ m ■ v) there exist x and y such that x ■ 6{p) ■ y = 
u ■ jsr((f(p)) • V. Now, 

wrp^(M,jsr(i5(p)) • v,a) 

> {Law [Jsr-wrp], "StackOverflow" G A, existence of x and y} 
wrp^(x, S{p) -y,a-{u- }sr{6{p)),v)) 

> (Second conjunct of fit(? 7 , S,u ■ m ■ v)} 

V{p) ; wrPA(wjsr(<5(p)),w,a) 

= (Definition of call semantics} 

wrp]^(7r) ; wrp^(wjsr((5(p)),w,a) . 

Case b.) tt = blk(p, tt^, tt;,) . By the [Blk] rule, there are assembler programs 
nip, mb and labels Ip, lb such that m = goto(4) • Ip ■ mp ■ ret • lb ■ mb and 
C{TTp,mp, S[p lp\) and C{TTb,mb, S[p lp\) hold. 

We would like to calculate as follows: 

wrp^(u, goto(/ft) • Ip ■ mp ■ ret • lb ■ mb ■ v, a) 

> (Laws [Goto-wrp] and [Label- wrp]} 
wrpJ^(u ■ goto(lb) ■ Ip ■ mp ■ ret • lb, mb ■ v, a) 

> (Induction hypothesis: C(TTb,mb,S[p i— Ip]) holds} 
wp'^}^^^'^\TTb) ; wrp^{u ■ goto(/f,) ■ Ip ■ mp ■ ret • lb ■ mb, v, a) 

= {Definition of block semantics} 

wrp[^(blk(p, TTp, TTb)) ; wrp^(u • goto(4) ■ Ip ■ mp ■ ret • lb ■ mb, v, a) . 

In order to apply the induction hypothesis in the second step, however, we have 
to check fit(? 7 [p >->• pV],5[p >->■ lp\,u ■ m - v), i.e. that for all q G dom((f[p >->• Ip]) 

3x,y: (2) 

X ■ S[p lp]{q) • y = u ■ m ■ V A 

'ie,f,g: vjrp,^{x,S[p ^ lp]{q) ■ y, g ■ {e, f)) > rj[p ^ fi'P]{q) ; wrp^{e, f, g) . 

So suppose given q G dom((f[p >->• Ip]). If q ^ p, (2) reduces to 

3x,y : X ■ 5{q) ■ y = u ■ m ■ V A 

'^e,f,g: wrp^{x, S{q) ■ y,g ■ {e, f)) > g{q) ; wrp^{e, f, g) , 
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which follows directly from fit(? 7 , 6,u ■ m ■ v). For q = p, on the other hand, we 
must prove 

3x, y : X ■ Ip ■ y = u ■ m ■ V A 

'^rp^{x,lp-y,g- {e,f)) > yV ; wrp^{e, f, g) . 

Choosing x = u ■ goto(/f,) and y = nip • ret • 4 • trifc • v makes the first conjunct 
true. The second conjunct is established by a fixpoint induction for yP: 

Admissibility is straightforward and the base case follows easily from the fact 
that _L ; wrp^(e, /, (/) = _L. For the induction step assume that X is given such 
that for all e, f,g 



wrpAix, Ip - y,g ■ (e,f)) > X ;wrp^{e,f,g) . (3) 

Now, fit(? 7 [p I— >■ X], S[p e- >• lp],u ■ m ■ v) holds: for q ^ p we can argue as above 
and for q = p this follows from (3). Thus, by using the induction hypothesis of 
the structural induction applied to we can calculate as follows for arbitrarily 
given e,f,g-. 

wrpA{x,lp-y,g- (e,/)) 

> {Law [Label-wrp] and unfolding of y} 
wrp^(x • Ip, mp ■ ret - lb ■ mb ■ v,g ■ (e, /)) 

> (Induction hypothesis applied to tt^} 
wrp^^*”^^'(7Tp) ; wrp^(a; • Ip ■ mp, ret -lb-mb-v,g ■ (e, /)) 

> (Definition of V and law [Ret-wrp]} 

V{X) ; wrp^ie, f,g) . 

This completes the fixpoint induction. □ 

9 Conclusion 

Two interweaved aspects motivated us to write the present paper. First of all we 
wanted to prove correct translation of a language with procedures to abstract 
assembler code; not just somehow or other but in an elegant and comprehensible 
manner. Algebraic calculations with predicate transformers turned out to be an 
adequate means for languages without procedures (see, e.g., [9]), so we decided to 
apply this technique in the extended scenario, too. The second stimulus is due 
to [10], where we proposed to employ wrp-semantics in compiler proofs. Real 
processors are always limited by their finite memory and a realistic notion of 
translation correctness must be prepared to deal with errors resulting from this 
limitation. We hope that the current paper demonstrates convincingly that wrp- 
based reasoning can cope with finite machines without burdening the verification. 

The target language studied in this paper provides an adequate level of ab- 
straction for further refinement down to actual binary machine code. The instruc- 
tions may be considered as ‘macros’ for instructions of a more concrete assembler 
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or machine language. Labels facilitate this, as they allow to describe destination 
of jumps independently from the length of code. An interesting aspect of our 
proof is that it shows how to handle the transition from tree-structured source 
programs to ‘flat’ target code. For this purpose we established a stronger claim 
that involves a universal quantification over syntactic target program contexts. 
This should be contrasted to the use of a tree-structured assembler language in 
[11] where translation correctness for a While-language without procedures is 
investigated. The proof in [11] does not immediately generalize to flat code. 

Future work includes studying the relativized version of preservation of par- 
tial correctness (oo G A). In this case, semantics of recursive constructs is given 
by greatest rather than least flxpoints. As a consequence, flxpoint reasoning ba- 
sed on the flxpoints in the source language does not seem to work. We intend 
to use a flxpoint characterization of the target language’s semantics instead. We 
also are working on concretizing from the abstract assembler language towards 
a realistic processor. 
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Abstract. We show how a program analysis technique originally develo- 
ped for C-like pointer structures can be adapted to analyse the hierarchi- 
cal structure of processes in the ambient calculus. The technique is based 
on modeling the semantics of the language in a two- valued logic; by rein- 
terpreting the logical formulae in Kleene’s three-valued logic we obtain an 
analysis allowing us to reason about may as well as must properties. The 
correctness of the approach follows from a general Embedding Theorem 
for Kleene’s logic; furthermore embeddings allow us to reduce the size of 
structures so as to control the time and space complexity of the analysis. 



1 Introduction 

Mobile ambients. The ambient calculus is a prototype web-language that al- 
lows processes (in the form of mobile ambients) to move inside a hierarchy of 
administrative domains (also in the form of mobile ambients); since the pro- 
cesses may continue to execute during their movement this notion of mobility 
extends that found in Java where only passive code in the form of applets may 
be moved. Mobile ambients were introduced in [1] and have been studied in 
[2,3,4,9,13,16,17]. The calculus is patterned after the 7r-calculus but focuses on 
named ambients and their movement rather than on channel-based communi- 
cation; indeed, already the communication-free fragment of the calculus is very 
powerful (and in particular Turing complete); we review it in Section 2. 

Since processes may evolve when moving around it is hard to predict which 
ambients may turn up inside what other ambients. In this paper we present an 
analysis that allows us to validate whether all executions satisfy properties like: 

— Is there always exactly one copy of the ambient pi 

— Is p always inside at most one of the ambients r\, T 2 and r^l 

Kleene’s three-valued logic. In [18] Kleene’s three-valued logic is used to 
obtain safe approximations to the shape of dynamically evolving C-like poin- 
ter structures. From a programming language point of view, the setting of the 
present paper is vastly different. In contrast to traditional imperative langua- 
ges the ambient calculus has no separation of program and data, furthermore 
non-determinism and concurrency are crucial ingredients and hence the notion 
of program point is demoted. The central operations on C-like pointers are as- 
signments, which are executed one at a time and with a local effect on the heap; 
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this is in contrast to the reductions of the ambient calculus which may happen 
in a number of not a priori known contexts - thus the overall effect is hard to 
predict. We present an overview of Kleene’s three- valued logic in Section 3. 

Predicate logic as a meta- language. Our overall approach is a meta-language 
approach as known for example from the denotational approach to program ana- 
lysis [14]. However, here we are based on a predicate logic with equality and ap- 
propriate non-logical predicates as well as operations for their transitive closure; 
the choice of non-logical predicates is determined by the programming language 
or calculus at hand. To deal with equality we rely on the presence of a special 
unary summary predicate indicating whether or not an individual represents one 
or more entities. The important point is that the logic must be powerful enough 
to express both 

— the properties of the configurations that we are interested in, and 

— a postcondition semantics for transitions between configurations. 

From a process algebra point of view, our representation of ambients (presented in 
Section 4) is rather low-level as it operates over structures containing a universe 
of explicit individuals; the non-logical predicates are then used to encode mobile 
ambients within these structures. The benefit of using sets of individuals, over 
the more standard formulation using multisets of subambients, is that it directly 
allows us to use the logical formulae. 

Static analysis. The aim of the analysis is to identify certain invariant pro- 
perties that hold for all executions of the system; from the process algebra point 
of view, the invariants are akin to types and they represent the sets of ambients 
that can arise. Since the set of ambient structures may be infinite, the analysis 
needs to perform an abstraction to remain tractable. 

For a moment let us assume that we continue interpreting the specification 
in a two-valued logic. From the classical program analysis point of view, the 
maxim that “program analysis always errs on the safe side” amounts to saying 
that the truth values false and true of the concrete world are being replaced 
by 0 (for false or cannot) and a value 1/2 (for possibly true or may) in the 
abstract world. As an example, if a reachability analysis says that a given value 
(or ambient) cannot reach a given point (or ambient) then indeed it cannot, but 
if the reachability analysis says that it might then it may be a “false positive” 
due to the imprecision of the analysis. 

The Embedding Theorem. The power of our approach comes from the ability 
to reinterpret the transition formulae over Kleene’s three-valued logic. Here we 
have the truth values 0 (for false or cannot), 1 (for true or must) and 1/2 (for 
possibly true or may) [18]. The benefit of this is that we get a distinction between 
may and must properties for free! Returning to the reachability example we are 
now able to express certain cases where a value (or ambient) definitely reaches 
a given point. 
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It is straightforward to reinterpret the specification of the semantics over 
structures that allow all three truth values (see Section 5). The correctness of 
the approximate analysis with respect to the semantics follows using a general 
Embedding Theorem that shows that the interpretation in Kleene’s three- valued 
logic is conservative over the ordinary two-valued interpretation. Termination 
is guaranteed thanks to our techniques for restricting attention to a bounded 
universe of individuals and for combining certain structures into one (thereby 
possibly introducing more 1/2’s); these techniques generally work by allowing us 
to compress structures so that fewer individuals (perhaps only boundedly many) 
are needed and thereby allowing us to control the time and space complexity of 
our analysis. 

2 The Ambient Calculus 

Syntax and informal semantics. We shall study the communication-free 
subset of the ambient calculus [1,2] and for simplicity of presentation we shall 
follow [2] in dispensing with local names. Given a supply of ambient names n G N 
we define the syntax of processes P € Proc and capabilities M G Cap: 

P ::= 0 I PjP' I \P I n[P] \ M.P 
M ::= in n \ out n \ open n 

The first constructs are well-known from the 7r-calculus. The process 0 is the 
inactive process; as usual we shall omit trailing O’s. We write P\P' for the 
parallel composition of the two processes P and P' and we write \P for the 
replicated process that can evolve into any number of parallel occurrences of P. 

The remaining constructs are specific to the ambient calculus. The construct 
n[P] encapsulates the process P in the ambient n. In the basic ambient calculus 
a process can perform three operations: it can move into a sibling ambient using 
the in n capability, it can move out of the parent ambient using the out n 
capability or it can dissolve a sibling ambient using the open n capability. These 
operations are illustrated pictorially in Fig. 1 where we draw the processes as 
trees: the nodes of the trees are labelled with names and capabilities and the 
subtrees represent parallel processes “inside” the parent. The figure expresses 
that when a process matches the upper part of one of the rules then it can be 
replaced by a process of the form specified by the lower part . The reduction can 
take place in a subprocess occurring deeply inside several ambients; however, 
capabilities always have to be executed sequentially. 

Example 1. Throughout the paper we shall consider the following example: 

p[ in ri[ ! open r ] | ri[ ! r[ in p. out r\. in T 2 ] ] 

I r 2 [ ! r[ in p. out T2. in T3 ] | ! r[ in p. out T2. in Ti ] ] 
1 ^ 3 [] 

It illustrates how a package p is first passed to the site r\ , then to T 2 from which 
it is either passed to rs or back to ri. 
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Fig. 1. Semantics of capabilities and replication. 

Formal semantics. Formally the semantics is specified by a structural con- 
gruence relation P = Q allowing us to rearrange the appearance of processes 
(e.g. corresponding to reordering the order of the descendants of a node in a 
tree) and a reduction relation P —» Q modelling the actual computation; the 
only deviation from [1] is that the semantics of replication is part of the transi- 
tion relation rather than the congruence relation (see Fig. 1). 



3 A Primer on Three-valued Logic 



Syntax. It will be convenient to use a slight generalization of the logic used 
in [18]. Let Pr[k] denote the set of predicate symbols of arity k and let Pr = 

Pr[k] be their finite union. We shall write = for the equality predicate and 
furthermore we shall assume that Pr[l] contains a special predicate sm. Here sm 
stands for “summary-predicate” and we shall later interpret it as meaning that 
its argument might represent multiple individuals. Without loss of generality 
we exclude constant and function symbols from our logic; instead we encode 
constant symbols as unary predicates and n-ary functions as n-l- 1-ary predicates. 

We write formulae over Pr using the logical connectives V, -■ and the quan- 
tifier 3; the formal syntax is: 



ip ::= vi — V2 

I P{V1,V2,... ,Vk) 

I R'"{vi,V 2,. . . ,Vk) 

I P"^{VI,V2) 

I Pi 

I “"Fl 
I : (/? 



equality on individuals 
predicate value, y € Pr[k] 
application of second order free variable 
transitive closure of a relation p, p £ Pr[2] 

disjunction 

negation 

(first order) existential quantification 



Capital letters of the form i?* are used for (second-order) relations of arity k. We 
also use several shorthands: Vw : p stands for -i3u : -k/? and pi A p 2 stands for 
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Table 1. Kleene’s three- valued interpretation of the propositional operators. 



A 


|0 1 1/2 


V 


011/2 


— 1 


0 


0 

0 

0 


“(T 


0 1 1/2 


0 


1 


1 


0 1 1/2 


1 


111 


1 


0 


1/2 


0 1/2 1/2 


1/2 


1/2 1 1/2 


1/2 


1/2 



-<(-<ipi\/ The above shorthands are useful since three- valued logic formulae 
obey De-Morgan laws. Also (fi (/?2 stands for (-'VJi V(/?2) and v\ ^ V 2 stands 
for -i(ui = U2). Finally, we assume the standard associativity and precedence 
rules. 

Semantics. A two-valued interpretation of the language of formulae over Pr is 
a structure T = {U, 0 ) 2 , where U is a, set of individuals and t maps each predicate 
symbol p of arity fc to a truth- valued function: 

L-. Pr[k] ^ {0,1}. 

A three-valued interpretation is then a structure T = {U, t)^, where now t maps 
each predicate symbol p of arity A: to a truth- valued function: 

l: Pr[k] ^ ^ (0,1, 1/2}. 

We use Kleene’s three- valued semantics which operates with the three values: 0, 
1 and 1/2. The values 0 and 1 are called definite values and 1/2 is an indefinite 
value. The informal semantics of this logic is given in Table 1 where 1/2 repre- 
sents situations were the result may be either true or false. Alternatively, think 
of 1 as representing {true}, 0 as representing {false}, and 1/2 as representing 
{false, true}. In the propositional case, our presentation of three- valued logics 
here follows [7, Chapter 8]. We shall omit the subscripts 2 and 3 when it is clear 
from the context whether we are in a two- valued or a three- valued world. 

The semantics is rather standard given Table 1; due to space limitations we 
dispense with the formalisation. There are, however, one slightly non-standard 
aspect, namely the meaning of equality (denoted by the symbol ‘=’). This comes 
from our desire to compactly represent multiple concrete elements with the same 
“abstract element” . Therefore, the meaning of the predicate = is defined in terms 
of the unary summary predicate, sm, that expresses that an individual represents 
more than one entity, and the equality, =, upon individuals: 

— Non-identical individuals are not equal: ui — U 2 yields 0 if tti yf M2. 

— A non-summary individual is equal to itself: u — u yields 1 if sm(u) = 0. 

— A summary individual may be equal to itself: u — u yields 1 /2 if sm{u) = 1/2. 

Here we exploit the fact that sm{u) is never allowed to give 1. This will be 
made more precise when defining the notion of plain (two-valued) and blurred 
(three-valued) structures in Section 4. 

Since we are interested in program analysis it is important to observe that 
there is an information ordering C where I 1 QI 2 denotes that l\ has more definite 
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Table 2. The intended meaning of the predicate symbols. 



predicate intended meaning 



pa{vi,V2) 

m{v) 
in m(v) 
out m(v) 
open m{v) 

Kv) 

sm{v) 



is V2 an immediate parent of vi? 

does V denote an occurrence of an ambient named m? 
does V denote an occurrence of an action in m? 
does V denote an occurrence of an action out m? 
does V denote an occurrence of an action open m? 
does V denote an occurrence of a replicator operation ! ? 

does V represent more than one element? 



information than I2', formally, li C I2 if and only if li = I2 or I2 = 1/2. We write 
U for the join operation with respect to T. Viewing 0 as meaning {false} etc. the 
information ordering coincides with the subset ordering and U with set-union. 
It is important to point out that Kleene’s logic is monotonic in this order. 



4 The Abstract Domain 



Motivation. The two ambients n[m[0]] and n[m[0][m[0]] are distinct be- 
cause the former has only one occurrence of m inside n whereas the latter has 
two. In other words, the collection of constituents of an ambient denote a multiset 
rather than a set. 

To facilitate the use of classical notions from logic we want to view the 
collection of constituents as a set rather than as a multiset. We do this by 
introducing a set of individuals, so that the collection of constituents simply 
are a set of individuals. Informally, the above ambients will be represented as 
{ui : n) [ (u2 : m) [ 0 ] ] and (ui : n) [ (u2 : to) [ 0 ] [ (U3 : to) [ 0 ] ] , respectively. 

Once we have introduced the notion of individuals we are ready to model 
ambients by structures of the kind already mentioned in Section 3 and defined 
formally below. These structures are obtained by fixing the set of predicate 
symbols so as to be able to represent ambients; we shall use the predicates 
shown in Table 2. In particular, there is a binary relation pa to represent the 
parent relation between individuals, and a number of unary relation symbols to 
represent the ambient information associated with individuals. Returning to the 
two ambients above we have that 



n[TO-[0]|TO[0]] yields t{pa) {u, u') 



1 if u G {u 2 , U3} Au' = ui 
0 otherwise 



and similarly for n[ to[ 0 ] ] . 



Plain and blurred structures. We shall first fix the set of predicates to be 
used in the logic. Let V be a finite and non-empty set of ambient names and let 
Pr = Pr[l] U Pr[ 2 ] be given by the following sets of predicates: 

Pr[l] = {sm} U {.^} U {in to, out to, open m \ m € N}U {to \ m € N} 
Pr[ 2 ] = {pa} 
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Ml : p 




M2 : ri 




Us : T2 




U4 : rs 



Us : in ri 



«6 : ! 



uio : open r 



U7 : !\ 



A 

I uii : r 



Ui4 : m p 
A 



A 



Us : / 



A 

Ui2 : r I 



Ui5 : in p 

A 





Mg : ! 






A 




\ 


Mi 3 : r 


] 


A 


UiQ : in p 



U17 : out ri 




Mis : out T2 




Mig : out V2 


A 


A 


A 


U20 : in V2 




U21 : in rs 




U22 : in ri 



Fig. 2. A plain (two-valued) structure for the running example. 

A blurred structure is then a three-valued interpretation T = {U, 1)3 in the sense 
of Section 3 that satisfies the following conditions: 

— The set U is countably infinite and 'iu&U : t{sm){u) yf 1. 

— The set Uac defined below is finite: 

Uac = {ueU \ 3 pe Pr[l] \ {sm} : i{p){u) yf 0 V 

3 u' G U : {l{po){u,u') yf 0 V i(pa){u' ,u) yf 0)} 

A plain structure S = ( 0 , 0)2 is a two-level structure satisfying the above condi- 
tions; hence the unary predicate sm maps all individuals to 0. Plain structures 
suffice for representing ambients precisely. Blurred structures are only needed in 
order to obtain a computable analysis. 

Example 2. Fig. 2 shows the plain structure that corresponds to the program 
of Example 1. Directed edges represent the parent relation and individuals are 
annotated with those unary predicates that give the value 1 . 

Fig. 3 shows a blurred structure for the same program. It is obtained by 
merging individuals from Fig. 2 that satisfy the same unary predicates. As an 
example the individuals M 5 ad M 22 in Fig. 2 are merged into a summary individual 
U5,22 in Fig. 3; this individual is now surrounded by a dotted box since its sm 
value is 1 / 2 . Also, it has two outgoing dotted edges which describe the two 
potential places in the ambient hierarchy were the capability can occur; dotted 
edges means that the pa predicate evaluates to 1 / 2 . 



Representations of ambients. Table 3 defines a one-to-one (but not neces- 
sarily onto) mapping from mobile ambients into plain structures. It makes 
use of the operation empty that returns a structure {U, o) where all predicates 
are interpreted so that they yield 0. The operation new{p, {U, i.)) (for p G Pr)!]) 
returns a structure {U', o') that is as (U, t) except that now contains an 
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Wi : p L 




W3 : 7-2 


1 W4 : r3 



W5.22 ■■ in ri We, 7 , 8,9 : ! 

- A 

wio : open r I Wn,i2,i3 : r 
~ A 



Wl4,15,16 : *w p 



Wl8,19 


: out T2 


uir : out ri 




A 






W21 


: in rs\ 




U 20 ' in T 2 



Fig. 3. A blurred (three-valued) structure for the running example. 

Table 3. The mapping^ from ambients to structures. 

new(in m, P) 
new{out m, P) 
new{open m, P) 

\P = new{!, P) 



0 = empty 
m[P] = new{m, P) 

P 1 IP 2 = Pi ttJ p2 



in m.P = 
out m.P = 
open m.P = 



additional element r not in UaP, the predicate p is set to 1 on r and all other 
predicates involving r are set to 0. Some of the individuals r' of U will serve 
as “roots” (of subprocesses represented by the structure) and the predicate pa 
is set to 1 on the pairs (r',r) and 0 elsewhere. We shall not formalise the con- 
cept of “roots” but only mention that the individual r will be the “root” of the 
new structure. Finally, the operation {U, t) l±l ([/', d) returns a structure {U", l") 
where U” = UUU' and we take case to rename the individuals of U and U' such 
that C/ n C/' = 0; the “roots” are the union of those of the two structures. Fig. 2 
shows the result of applying to the program of Example 1. 

Embeddings. Next, we define an embedding order on structures. 

Definition 1. Let T = {U,i)^ and T' = be two structures (for k being 

2 or 3) and let f : U ^ U' be a surjective function. We say that f embeds T in 
T' (written T C-f T' ) if 

— for every p G Pr[k] \ {sm}: 

t'(p)(u), . . . ,Mfc) □ y i{p){ui,... ,Uk) (1) 

f{ui)=u'^,l<i<k 



— for every u' G U' : 



b' {sm){u') □ 



^|{w|/(w) 



/}| > 1 



^ u 

/(«)=«' 



b{sm){u) 



(2) 
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The embedding is tight if equalities hold in (1) and (2). 

We say that T can he embedded in T' (denoted by T Q T' ) if there exists a 
function f such that T T' . 

We can now establish an Embedding Theorem [18] that intuitively says: 

If T can be embedded in T' , then every piece of information extracted 
from T' via a formula is a conservative approximation of the informa- 
tion extracted from T via (p. 

The first usage of the embedding order is to define the concretization: the set of 
plain structures (and hence ambients) described by a blurred structure. 

Definition 2. For a blurred structure T, we denote by £{T) the set of plain 
structures S that can he embedded into T. 

If a formula evaluates to 1 over a blurred structure T then it is true in all 
plain structures S G F(T); if it evaluates to 0 then it is false in all plain structures 
S' G S(T); finally, if it evaluates to 1/2 it may be true in some Si G S(T) and 
false in some So G S(T). 

Example 3. For the program of Example 1 we may be interested in the properties 

unique = 'iv\,V 2 '■ p{vi) /\p{v 2 ) vi — V 2 (3) 

position = 'dvi,V 2 , : p{vi) A pa~^ (vi,V 2 ) A pa^(ui, U 3 ) A ro{v 2 ) A ro{v^) 

V2 — V3 (4) 

where ro{v) = ri{v) V r 2 {v) V r 3 (v). The formula (3) expresses that the structure 
only contains a single copy of the ambient p. The formula (4) expresses that 
the ambient p will be within at most one of the ambients ri, r 2 and r^. These 
formulae have the value 1 when evaluated on the structures of Figures 2 and 3. 

Bounded Structures and Canonical Embeddings. A simple way to gua- 
rantee the termination of our analysis is by ensuring that the number of blurred 
structures is a priori bounded. A blurred structure T = ([/, t) is hounded if 
for every two different elements U\,U 2 G Uac there exists a unary predicate 
p G Pr[l] \ {sto}, such that r(p)(ui) yf l{p){u2)- Clearly, the number of different 
individuals in Uac is then bounded by = 0(3l'^l) and thus the number 

of bounded structures is finite (up to isomorphism) . 

Moreover, every blurred structure can be embedded into a bounded structure 
by “joining” the truth-values of individuals mapping into the same abstract 
individual. More precisely, a special kind of tight embedding, called canonical 
embedding, from structures into bounded structures is obtained by defining the 
embedding / to map individuals u\ and U 2 in Uac to the same individual if and 
only if it is not the case that there is a predicate p G Pr[l] \ {sm} such that 
t{p){ui) yf i{p){u 2 ). Since the canonical embedding / is uniquely defined on T 
(up to isomorphism) we denote /(T) simply as [TJ. 

Example /. The blurred structure in Fig. 3 is the canonical embedding of the 
plain structure of Fig. 2. 
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Table 4. Shorthand formulae used in the transition formulae. 



formula 


intended-meaning 


formal- meaning 


nb{v) 


t) is a non-blocking action 


VmenMv) 


nba{v) 


all ancestors of v are non-blocking Vni : pa'*'(v,vi) => nb{vi) 


sib{v\,V2) 


vi and 1)2 are sibling individuals 


Vl 7>^ U2A 

( (3up : pa{vi,Vp) Apa{v2,Vp)) 

V -)(3up : pa{vi,Vp)V pa{v2,Vp)) ) 


ac(v) 


V is an active individual 


( V 

p€f’T’[l]\{ 5 m} 

V 3u' : (pa{v,v') V pa{v' ,v)) 



5 A Simple Analysis 

We now define the effect of ambient actions by formulae that compute new 
structures from old. These formulae are quite natural since when interpreted 
over plain structures they define a semantics which is equivalent to the one 
of Section 2 but when interpreted over blurred structures, they are conservative 
due to the Embedding Theorem. Restricting our attention to bounded structures 
thus gives us a conservative and efficient analysis of ambients. 

Capability actions. Table 5 defines the effect of capability actions using the 
shorthand formulae defined in Table 4. The semantics of Table 5 is formally 
defined in Definition 3. Informally, an action is characterized by the following 
kinds of information: 

— The condition in which the action is enabled. It is specified as a formula 
with free logical variables fc, fi, f2, ■ ■ ■ , fn ranging over individuals. The 
formal variable fc denotes the current individual and the rest of /i, / 2 , . . . , fn 
denote surrounding individuals. Whenever an action is enabled, it binds the 
free logical variables to actual individuals in the structure. Our operational 
semantics is non-deterministic in the sense that many actions can be enabled 
simultaneously and one of them is chosen for execution. 

— Enabled actions create a new structure where the interpretations of every 
predicate p € Pr[k] is determined by evaluating a formula ipp{vi,V2, ■ ■ ■ ,Vk) 
which may use vi,V2, ■ ■ ■ ,Vk and fc, fi, f2, ■ ■ ■ , fn as well as all p G Pr. 

For simplicity, our semantics does not deal with the removal of individuals (and 
hence our structures use an infinite set of individuals). 

Consider the treatment of the in m action in Table 5. It is enabled when an 
individual fc is non-blocking (i.e. when there are no capabilities or replication 
operators on a path to a root of the structure), it denotes an in m action, it has a 
parent fp, with a sibling individual fs which denotes an ambient named m. Next 
the enabled in m action creates a parent relation, where fp is connected to fs, 
predecessors of fc are connected to fp, and individuals which are not emanating 
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Table 5. The structure update formulae for executing capability actions. 




from fc or fp and not entering fc are not changed. On plain structures this ope- 
ration exactly emulates the operational semantics (defined informally in Fig. 1). 
However, on blurred structure it can yield indefinite values as demonstrated in 
the blurred structure in Table 4, which is obtained by performing the action in 
ri on the blurred structure shown in Fig. 3. 

Formally, the meaning of capability actions is defined as follows: 

Definition 3. We say that a n-valued structure T = rewrites into a 

structure T' = ([/', i') ^ (denoted by T-^mT') where M G {inm, out m, open m \ 
m G N} if there exists an assignment Z such that |cm(/cj fi, fi-, - ■ ■ , fn)j'^(Z) yf 
0 where the formula cm(/o fi, f 2 , ■ ■ ■ , fn) is defined in the first row of Table 5, 
and for every p G Pr[k] and ui, . . . ,Uk G U' , 

i'{p){ui, ... ,Uk) = \pp{vi,V 2 ., ■ . ■ ,Vk)fi^{Z[vi Ui,V 2 U 2 , . . . , Wfe Ufe]) 
where Pp{v\, • • • , Vk) is the formula for p given in Table 5. 
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Table 6. The structure update formulae for executing the replication operation. 



Action 


C]{fc, I^) = nba{fc) A /(/c)A 

Vui : pa~^ (vi , fc) : (-100(112) A 7 ^ (ni , 112) A (Vi’s : , 113) V2 = V3)) 


Diagram of T 




fp 

A 




[ 


/c : . 




A 

0 


>Ppa(ui, W2) 


pa{vi, V2) V { 3 v[, v'2 : pa{v[, v'2) A vi) A V2)) 

V { 3 v[ : pa(v[, fc) A I^{v[,v-i) A pa{fc,V2)) 


p e Pr[\] : (Pp(u) 


p{v) V 3 v' : p{v') A , n) 


Diagram of T' 




fp 

A 


X 









After reduction of in ri at fc = 115,22, /p = ui, fs =112: 



u\ •. p\- s-j 112 : ri I 1 113 ; P2 I \ ua ■ rz 

' a A ' ' 

115,22 : inn : ■■ 116,7,8,9 :! ; 



1110 : open r I 1111,12,13 : r 

a; . 

1114 , 15,16 '■ inp 




Fig. 4. The bounded structures arising in analysis after the first iteration. 

Replication actions. In Table 6 we define the meaning of the replication 
action; we express this operation in logic by using a second order relation 
that creates new isomorphic individuals for the replicated ambients: 

— The condition under which the operation is enabled: that the current indi- 
vidual fc is a non-blocking replication operation. 

— The update formula (p. Here we use an extra preset relation V2) which is 

set to true if V2 is a new instance of an individual v\ . This relation is set before 
the formulae (fip{vi,V2, ■ ■ ■ , Vk) is evaluated. The formulae ipp{vi,V2, • ■ • , Vk) 
uses vi,V2, ■ ■ ■ ,Vk, fc, and the new relation P. 
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On plain structures the replication operation exactly emulates the operational 
semantics. However, on blurred structure it can yield indefinite values and even 
more when the resultant structure is converted into a bounded one. 

Formally, replication is handled as follows: 

Definition 4. We say that a n-valued structure T = {U, l ) rewrites into a struc- 
ture T' = (denoted by TAiT'J if there exists an assignment Z such that 

[affc, yf 0 where the formula c\{fc,I^) is defined in the first row of 

Table 6, and for every p € Pr[k] and ui, U 2 , ... , Wfc G U' , 

l'{p){ui,U2, ■ ■ ■ ,Uk) = \ipp{vi,V2, . ■ . ,Wfc)]^(Z[ui Ui,V2 M 2 ,... , Mfc Ufc]) 



where (pp{vi, • • • , Vk) is the formula for p given in Table 6. 

Finally, we can formally define the analysis (or abstract semantics): 



Definition 5. We say that a n-valued structure T = {U, l) rewrites into a struc- 
ture T' = ([/, d) (denoted by T-^T' ) if either T^ mT' orT-^\T' . We say that T 



jnically rewrites into T' (denoted by T^T') when T 



T" such that T' = [T"J and T— >-T". We denote by 



3 * 



closure of — >■ and similarly for — >■ and 



13 ^* 



= [TJ and there exists 
the reflexive transitive 



Properties of the Abstract Semantics. The set of bounded structures 

AnT^%*T'} (5) 

is finite and can be computed iteratively in a straightforward manner. 

We can show that the semantics of processes, P Q, is correctly modelled 
by our plain rewrite relation, P ^ Q. It then follows from the Embedding 
Theorem that the semantics of processes is correctly modelled by our blurred 
rewrite relation: 

Whenever P Q we have 3T G : Q QT. (6) 

Thus, we can verify safety properties of ambients by evaluating formulae against 
blurred structures in Anx- Of course, Anx may also include superfluous struc- 
tures leading to imprecise results. 



6 Conclusion 

So far we have presented a very naive analysis of ambients (in the spirit of 
the shape analysis algorithm in [5, Section 3]). The motivation was to show 
the benefit of three-valued logics. We have implemented the analysis using the 
TVLA system [12] but in its current form it is too costly and too imprecise to 
give informative answers to the questions asked of the running example. 

To get an efficient and sufficiently precise analysis we have refined the analysis 
by using two techniques already mentioned in [18]. One is to maintain finer 
distinctions on blurred structures based on values of so-called instrumentation 
predicates and the other is to control the complexity by joining structures. 
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Table 7. Definition of instrumentation predicates. 



action defining formula 



in m for each ambient name 2 : 

inside[z,m] = inside[z,m] V : z(v) A pa*(v, fp) 

out m for each ambient name z : 
inside[z, m] = inside[z, m]A 

( ( 3 vi,V 2 ■■ z(vi) Apa*(vi,fp) Am(v2) A pa+ {fpp,V2))V 
( 3 vi,W 2 : z{vi) A m{v2) A pa+ (vi , V2) A ^{pa*{vi,fp) A pa*(/p, U2))) ) 

open m for each ambient name t : 
inside[z,m] = inside[z,m]A 

( ( 3 ui ,«2 : z{vi) Apa+{vi,fs) Am{v2) A pa* {fp,V2))V 
( 3 vi, W2 : z{vi) A m{v2) A pa+(ui,U2) A ~<{pa* {vi, fs) A pa* {fs , V2))) ) 



Instrumentation predicates. Technically, instrumentation predicates are just 
predicates which store some context information. For our analysis we have added 
two kinds of instrumentation predicates. One group of predicates simply labels 
the individual ambients and capabilities of the program much as in [9,16]. As 
an example, the initial blurred structure will now be similar to the initial plain 
structure displayed in Fig. 2 as now the active individuals will have distinct labels 
and hence will not be combined. The labels will remain unchanged throughout 
the analysis and when a part of a structure is copied as in the analysis of re- 
plication the new individuals inherit the labels of the original individuals. The 
benefit of adding these instrumentation predicates is that the analysis can better 
distinguish between the different routers in which the packet resides. 

Another group of instrumentation predicates are designed to reflect the three 
questions we have asked which all are concerned about which ambients are in- 
side which ambients. We therefore define nullary predicatesmszde[zi, Z 2 ] for each 
combination of the ambient names (zi, 2 : 2 ) G x it is defined by 

inside[zi, Z2] = 3 ui, U2 : zi(vi) A Z2(v2) A pa~^(vi,V2) 

and it is updated whenever one of the capabilities are executed as shown in Table 
7; it is unchanged when the replication action is executed. 

Joining structures. While the goal of adding instrumentation predicates is to 
get more precision, the goal of joining structures is to get more efficiency and 
as usual this means that we are going to loose precision. We therefore merge 
structures satisfying the same nullary instrumentation predicates. 

With these modifications the system of [12] can indeed validate the two 
properties of Example 3. This took 192.6 CPU seconds on a Pentium 256 Mhz 
machine running NT 4.0 with JDK 1.2. 

Acknowledgements. The running example was suggested by Luca Cardelli. 
Tal Lev-Ami provided the implementation discussed in Section 6. 
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Abstract. Extending a subtyping-constraint-based type inference frame- 
work with conditional constraints and rows yields a powerful type infe- 
rence engine. We illustrate this claim by proposing solutions to three 
delicate type inference problems: “accurate” pattern matchings, record 
concatenation, and “dynamic” messages. Until now, known solutions re- 
quired significantly different techniques; our theoretical contribution is 
in using only a single (and simple) set of tools. On the practical side, 
this allows all three problems to benefit from a common set of constraint 
simplification techniques, leading to efficient solutions. 



1 Introduction 

Type inference is the task of examining a program which lacks some (or even all) 
type annotations, and recovering enough type information to make it acceptable 
by a type checker. Its original, and most obvious, application is to free the 
programmer from the burden of manually providing these annotations, thus 
making static typing a less dreary discipline. However, type inference has also 
seen heavy use as a simple, modular way of formulating program analyses. 

This paper presents a common solution to several seemingly unrelated type 
inference problems, by unifying in a single type inference system several pre- 
viously proposed techniques, namely: a simple framework for subtyping -constraint- 
based type inference [15], eonditional constraints inspired by Aiken, Wimmers 
and Lakshman [2], and rows a la Remy [18]. 

Constraint-Based Type Inference 

Subtyping is a partial order on types, defined so that an object of a subtype may 
safely be supplied wherever an object of a supertype is expected. Type inference 
in the presence of subtyping reflects this basic principle. Every time a piece 
of data is passed from a producer to a consumer, the former’s output type is 
required to be a subtype of the latter’s input type. This requirement is explicitly 
recorded by creating a symbolic subtyping constraint between these types. Thus, 
each potential data flow discovered in the program yields one constraint. This 
fact allows viewing a constraint set as a directed approximation of the program’s 
data flow graph - regardless of our particular definition of subtyping. 

Various type inference systems based on subtyping constraints exist. One 
may cite works by Aiken et al. [1, 2, 5], the present author [16, 15], Trifonov 
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and Smith [22], as well as an abstract framework by Odersky, Sulzmann and 
Wehr [12]. Related systems include set-based analysis [8, 6] and type inference 
systems based on feature constraints [9, 10]. 

Conditional Constraints 

In most constraint-based systems, the expression if eg then ci else C2 may, 
at best, be described by 

Oi ^ O A Ot2 ^ o 

where stands for Cj’s type, and a stands for the whole expression’s type. 
This amounts to stating that “ei’s (resp. C2’s) value may become the whole 
expression’s value”, regardless of the test’s outcome. A more precise description 
- “if Co may evaluate to true (resp. false), then ei’s (resp 62 ’s) value may 
become the whole expression’s value” - may be given using natural conditional 
constraints: 



true < agl a\ < a A false < agl a 2 < cx 

Introducing tests into constraints allows keeping track of the program’s control 
flow - that is, mirroring the way evaluation is affected by a test’s outcome, at 
the level of types. 

Conditional set expressions were introduced by Reynolds [21] as a means 
of solving set constraints involving strict type constructors and destructors. 
Heintze [8] uses them to formulate an analysis which ignores “dead code”. He 
also introduces case constraints, which allow ignoring the effect of a branch, in 
a case construct, unless it is actually liable to be taken. Aiken, Wimmers and 
Lakshman [2] use conditional types, together with intersection types, for this 
purpose. 

In the present paper, we suggest a single notion of conditional constraint, 
which is comparable in expressive power to the above constructs, and lends itself 
to a simple and efficient implementation. (A similar choice was made indepen- 
dently by Fahndrich [5].) We emphasize its use as a way not only of introducing 
control into types, but also of delaying type computations, thus introducing some 
“laziness” into type inference. 



Rows 

Designing a type system for a programming language with records, or objects, 
requires some way of expressing labelled products of types, where labels are 
field or method names. Dually, if a programming language allows manipulating 
structured data, then its type system shall likely require labelled sums, where 
labels are names of data constructors. 

Remy [18] elegantly deals with both problems at once by introducing notation 
to express denumerable, indexed families of types, called rows: 



p::=a,P,... ,(p,ilj,...\ a: T] p \ dr 
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(Here, r ranges over types, and a,b,... range over indices.) An unknown row 
may be represented by a row variable, exactly as in the case of types. (By lack 
of symbols, we shall not syntactically distinguish regular type variables and row 
variables.) The term a : t; p represents a row whose element at index a is r, 
and whose other elements are given by p. The term dr stands for a row whose 
element at any index is t. These informal explanations are made precise via an 
equational theory: 



a : Ta, {b : Tb] p) = b : n; {a : Ta, p) 
dr = a : t; dr 



For more details, we refer the reader to [18]. 

Rows offer a particularly straightforward way of describing operations which 
treat all labels (except possibly a finite number thereof) uniformly. Because every 
facility available at the level of types (e.g. constructors, constraints) can also be 
made available at the level of rows, a description of what happens at the level 
of a single label - written using types - can also be read as a description of the 
whole operation - written using rows. This interesting point will be developed 
further in the paper. 



Putting It All Together 

Our point is to show that the combination of the three concepts discussed above 
yields a very expressive system, which allows type inference for a number of 
advanced language features. Among these, “accurate” pattern matching con- 
structs, record concatenation, and “dynamic” messages will be discussed in this 
paper. Our system allows performing type inference for all of these features at 
once. Furthermore, efficiency issues concerning constraint-based type inference 
systems have already been studied [5, 15]. This existing knowledge benefits our 
system, which may thus be used to ejficiently perform type inference for all of 
the above features. 

In this paper, we focus on applications of our type system, i.e. we show how 
it allows solving each of the problems mentioned above. Theoretical aspects of 
constraint solving are discussed in [15, 17]. Furthermore, a robust prototype 
implementation is publicly available [14]. We do not prove that the types given 
to the three problematic operations discussed in this paper are sound, but we 
believe this is a straightforward task. 

The paper is organized as follows. Section 2 gives a brief technical overview 
of the type system, focusing on the notion of constrained type scheme, which 
should be enough to gain an understanding of the paper. Sections 3, 4, and 5 
discuss type inference for “accurate” pattern matchings, record concatenation, 
and “dynamic” messages, respectively, within our system. Section 6 sums up 
our contribution, then briefly discusses future research topics. Appendix A gives 
some more technical details, including the system’s type inference rules. Lastly, 
Appendix B gives several examples, which show what inferred types look like in 
practice. 
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2 System’s Overview 

The programming language considered throughout the paper is a call-by-value 
A-calculus with let-polymorphism, i.e. essentially core ML. 

e ::= x,y, . . . \ Xx.e \ (e e) \ X,Y, . . . \ let X = e ±n e 

The type algebra needed to deal with such a core language is simple. The set 
of ground terms contains all regular trees built over T, T (with arity 0) and — >■ 
(with arity 2). It is equipped with a straightforward subtyping relationship [15], 
denoted <, which makes it a lattice. It is the logical model in which subtyping 
constraints are interpreted. 

Symbols, type variables, types and constraints are defined as follows: 

s ::= T I — >■ I T v ::= a,(3, . . . 

T ::= u I T I T — >■ r I T c ::= t < t 

I s < vl T < T 

A ground substitution ^ is a map from type variables to ground terms. A con- 
straint of the form ti < T 2 , which reads “ti must be a subtype of T 2 ”, is satisfied 
by (j) if and only if < 4>(t2). A constraint of the form s < al t\ < T 2 , 

which reads “if a exceeds s, then n must be a subtype of T 2 ”, is satisfied by (j) if 
and only if s <s head(</>(a)) implies 4>{ti) < where head maps a ground 

term to its head constructor, and <s is the expected ordering over symbols. A 
constraint set C is satisfied by </> if and only if all of its elements are. 

A type scheme is of the form 



CT ::= VC.T 

where r is a type and C is a constraint set, which restricts the set of cr’s ground 
instances. Indeed, the latter, which we call cr’s denotation, is defined as 

{t' ; 3(j) (/) satisfies C A ^(r) < t'} 

Because all of a type scheme’s variables are universally quantified, we will usually 
omit the V quantifier and simply write “r where C” . 

Of course, the type algebra given above is very much simplified. In gene- 
ral, the system allows defining more type constructors, separating symbols (and 
terms) into kinds, and making use of rows. (A full definition - without rows - ap- 
pears in [17].) However, for presentation’s sake, we will introduce these features 
only step by step. 

The core programming language described above is also limited. To extend it, 
we will define new primitive operations, equipped with an operational semantics 
and an appropriate type scheme. However, no extension to the type system - 
e.g. in the form of new typing rules - will be made. This explains why we do 
not further describe the system itself. (Some details are given in Appendix A.) 
Really, all this paper is about is writing expressive constrained type schemes. 
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3 Accurate Analysis of Pattern Matchings 

When faced with a pattern matching construct, most existing type inference 
systems adopt a simple, conservative approach: assuming that each branch may 
be taken, they let it contribute to the whole expression’s type. A more accurate 
system should use types to prove that certain branches cannot be taken, and 
prevent them from contributing. 

In this section, we describe such a system. The essential idea - introducing 
a conditional construct at the level of types - is due to [8, 2]. Some novelty 
resides in our two-step presentation, which we believe helps isolate independent 
concepts. First, we consider the case where only one data constructor exists. 
Then, we easily move to the general case, by enriching the type algebra with 
rows. 



3.1 The Basic Case 

We assume the language allows building and accessing tagged values. 

e ::= . . . | Pre | Pre^^ 

A single data constructor, Pre, allows building tagged values, while the destruc- 
tor Pre~^ allows accessing their contents. This relationship is expressed by the 
following reduction rule: 

Pre~^ wi (Pre U 2 ) reduces to (wi U 2 ) 

The rule states that Pre^^ first takes the tag off the value V 2 , then passes it to 
the function vi. 

At the level of types, we introduce a (unary) variant type constructor [ • ] . 
Also, we establish a distinction between so-called “regular types,” written r, and 
“field types,” written <f>. 

T ::= | T | T | r r | [(()] 

(j) ::= (p,ip, . . . I Abs | Pre r | Any 

A subtype ordering over field types is defined straightforwardly: Abs is its least 
element. Any is its greatest, and Pre is a covariant type constructor. 

The data constructor Pre is given the following type scheme: 

Pre : a — ^ [Pre a] 

Notice that there is no way of building a value of type [Abs]. Thus, if an ex- 
pression has this type, then it must diverge. This explains our choice of names. 
If an expression has type [Abs], then its value must be “absent”; if it has type 
[ Pre r ] , then some value of type r may be “present” . 
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The data destructor Pre ^ is described as follows: 

Pre^^ : (a — >• / 3 ) — >• [i^] — >• 7 
where ip < Pre a 

Pre < pT P < 'f 

The conditional constraint allows (Pre~^ ei 62) to receive type T when 62 has 
type [Abs], reflecting the fact that Pre~^ isn’t invoked until 62 produces some 
value. Indeed, as long as ip equals Abs, the constraint is vacuously satisfled, so 
7 is unconstrained and assumes its most precise value, namely T. However, as 
soon as Pre < p holds, /? < 7 must be satisfled as well. Then, Pre'^’s type 
becomes equivalent to (a — >■ / 3 ) ^ [Pre a] — >■ / 3 , which is its usual ML type. 

3.2 The General Case 

We now move to a language with a denumerable set of data constructors. 

e ::= . . . \ K \ K~^ \ close 

(We let K, L, . . . stand for data constructors.) An expression may be tagged, 
as before, by applying a data constructor to it. Accessing tagged values beco- 
mes slightly more complex, because multiple tags exist. The semantics of the 
elementary data destructor, K~^, is given by the following reduction rules: 

K~^viV2{Kv3) reduces to (ui U3) 

K~^ v\ V2 {L V3) reduces to {v2 (Tfs)) when K L 

According to these rules, if the value V3 carries the expected tag, then it is passed 
to the function vi. Otherwise, the value - still carrying its tag - is passed to the 
function V2- Lastly, a special value, close, is added to the language, but no 
additional reduction rule is defined for it. 

How do we modify our type algebra to accommodate multiple data construc- 
tors? In Section 3 . 1 , we used held types to encode information about a tagged 
value’s presence or absence. Here, we need exactly the same information, but 
this time about every tag. So, we need to manipulate a family of held types, 
indexed by tags. To do so, we add one layer to the type algebra: rows of held 
types. 

r ::= a,/3,7, . . . I T I T I r -)> r I [p] 
p ::= p,p, . . . \ K : p] p \ dp 
p ::= p,p, . . . I Abs j Pre t j Any 
We can now extend the previous section’s proposal, as follows: 

K : a ^ [K : Pre a; dkhs ] 

K~^ : (a — >■ /3) — >■ ([ AT : Abs; p] ^ j) ^ [K : p; p] ^ j 
where p < Pre a 

Pre < pT P < ^ 
close : [9Abs] — ^ T 
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K~^’s type scheme involves the same constraints as in the basic case. Using a 
single row variable, namely ip, in two distinct positions allows expressing the fact 
that values carrying any tag other than K shall be passed unmodified to K~^’s 
second argument. 

close’s argument type is [9Abs ], which prevents it from ever being invoked. 
This accords with the fact that close does not have an associated reduction 
rule. It plays the role of a function defined by zero cases. 

This system offers extensible pattern matchings: fc-ary case constructs may 
be written using k nested destructor applications and close, and receive the 
desired, accurate type. Thus, no specific language construct or type inference 
rule is needed to deal with them. 

4 Record Concatenation 

Static typing for record operations is a widely studied problem [4, 13]. Com- 
mon operations include selection, extension, restriction, and concatenation. The 
latter comes in two flavors: symmetric and asymmetric. The former requires its 
arguments to have disjoint sets of fields, whereas the latter gives precedence to 
the second one when a conflict occurs. 

Of these operations, concatenation is probably the most difficult to deal 
with, because its behavior varies according to the presence or absence of each 
field in its two arguments. This has led many authors to restrict their attention 
to type checking, and to not address the issue of type inference [7]. An inference 
algorithm for asymmetric concatenation was suggested by Wand [23]. He uses 
disjunctions of constraints, however, which gives his system exponential com- 
plexity. Remy [19] suggests an encoding of concatenation into A-abstraction and 
record extension, whence an inference algorithm may be derived. Unfortunately, 
its power is somewhat decreased by subtle interactions with Mb’s restricted poly- 
morphism; furthermore, the encoding is exposed to the user. In later work [20], 
Remy suggests a direct, constraint-based algorithm, which involves a special 
form of constraints. Our approach is inspired from this work, but re-formulated 
in terms of conditional constraints, thus showing that no ad hoc construct is 
necessary. 

Again, our presentation is in two steps. The basic case, where records only 
have one field, is tackled using subtyping and conditional constraints. Then, rows 
allow us to easily transfer our results to the case of multiple fields. 



4.1 The Basic Case 

We assume a language equipped with one-field records, whose unique field may 
be either “absent” or “present”. More precisely, we assume a constant data con- 
structor Abs, and a unary data constructor Pre; a “record” is a value built with 
one of these constructors. A data destructor, Pre~^, allows accessing the contents 
of a non-empty record. Lastly, the language offers asymmetric and symmetric 
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concatenation primitives, written @ and respectively. 

e ::= . . . | Abs | Pre | Pre~^ | @ \ @@ 

The relationship between record creation and record access is expressed by a 
simple reduction rule: 



Pre^^(Preu) reduces to v 

The semantics of asymmetric record concatenation is given as follows: 

Vi @ Abs reduces to Vi 
vi @ (Pre V2) reduces to Pre V2 

(In each of these rules, the value vi is required to be a record.) Lastly, symmetric 
concatenation is defined by 

Abs @@ V2 reduces to V2 
vi @@ Abs reduces to v\ 

(In these two rules, V\ and V2 are required to be records.) 

The construction of our type algebra is similar to the one performed in Sec- 
tion 3.1. We introduce a (unary) record type constructor, as well as a distinction 
between regular types and field types: 

r ::= a, /?, 7, . . . I T I T I T ^ r I {(/)} 

(f> ::= ip,ip, . . . I Bot I Abs | Pre t | Either t | Any 

Let us explain, step by step, our definition of field types. Our first, natural step 
is to introduce type constructors Abs and Pre, which allow describing values 
built with the data constructors Abs and Pre. The former is a constant type 
constructor, while the latter is unary and covariant. 

Many type systems for record languages define Pre r to be a subtype of 
Abs. This allows a record whose field is present to pretend it is not, leading 
to a classic theory of records whose fields may be “forgotten” via subtyping. 
However, when the language offers record concatenation, such a definition isn’t 
appropriate. Why? Concatenation - asymmetric or symmetric - involves a choice 
between two reduction rules, which is performed by matching one, or both, of the 
arguments against the data constructors Abs and Pre. If, at the level of types, 
we allow a non-empty record to masquerade as an empty one, then it becomes 
impossible, based on the arguments’ types, to find out which rule applies, and 
to determine the type of the operation’s result. In summary, in the presence of 
record concatenation, no subtyping relationship must exist between Pre r and 
Abs. (This problem is well described - although not solved - in [4].) 

This leads us to making Abs and Pre incomparable. Once this choice has been 
made, completing the definition of field types is rather straightforward. Because 
our system requires type constructors to form a lattice, we define a least element 
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Bot, and a greatest element Any. Lastly, we introduce a unary, covariant type 
constructor. Either, which we define as the least upper bound of Abs and Pre, 
so that AbsU (Pre r) equals Either r. This optional refinement allows us to keep 
track of a field’s type, even when its presence is not ascertained. The lattice of 
field types is shown in figure 1 on page 328 . 




Let us now assign types to the primitive operations offered by the language. 
Record creation and access receive their usual types: 

Abs : {Abs} 

Pre : a — >■ {Pre a} 

Pre^^ : {Pre «}—>■« 

There remains to come up with correct, precise types for both flavors of record 
concatenation. The key idea is simple. As shown by its operational semantics, 
(either flavor of) record concatenation is really a function defined by cases over 
the data constructors Abs and Pre - and Section 3 has shown how to accurately 
describe such a function. Let us begin, then, with asymmetric concatenation: 

® ■ Wi} {^2} {<^ 3 } 

where ip2 < Either 02 

Abs < ‘fii < ‘Pa 
Pre < p 2 ^ Pi’S «2 < Pa 



Clearly, each conditional constraint mirrors one of the reduction rules. In the 
second conditional constraint, we assume 02 is the type of the second record’s 
field - if it has one. The first subtyping constraint represents this assumption. 
Notice that we use Pre « 2 , rather than p2, as the second branch’s result type; 
this is strictly more precise, because p2 may be of the form Either « 2 - 
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Lastly, we turn to symmetric concatenation: 

@@ : {ipi} { 1 ^ 2 } -f {v53} 
where Abs < <^2 ^ <^3 

Abs < ip 2 T < ^3 
Pre < <^2 ^ Abs 

Pre < < Abs 

Again, each of the first two constraints mirrors a reduction rule. The last two 
constraints disallow the case where both arguments are non-empty records. (The 
careful reader will notice that any one of these two constraints would in fact 
suffice; both are kept for symmetry.) 

In both cases, the operation’s description in terms of constraints closely re- 
sembles its operational definition. Automatically deriving the former from the 
latter seems possible; this is an area for future research. 

4.2 The General Case 

We now move to a language with a denumerable set of record labels, written 
/, m, etc. The language allows creating the empty record, as well as any one- 
field record; it also offers selection and concatenation operations. Extension and 
restriction can be easily added, if desired; we shall dispense with them. 

e ::= 0 I = e} I e.; I @ I @@ 

We do not give the language’s semantics, which should hopefully be clear enough. 

At the level of types, we again introduce rows of field types, denoted by p. 
Furthermore, we introduce rows of regular types, denoted by g. Lastly, we lift 
the five field type constructors to the level of rows. 

r ::= a, /3, 7 , . . . I T I T I T T I {p} 

<j) ::= . . . I Bot I Abs | Pre r | Either t | Any 

p ::= a,/3,7, . . . I ; : r; p | 9r 

p ::= . . . \ I \ (j)] p \ d<p I Bot I Abs | Pre p | Either p | Any 

This allows writing complex constraints between rows, such as p < Pre a, where 
p and a are row variables. A constraint between rows stands for an infinite family 
of constraints between types, obtained component- wise. That is, 

{I : if'-, ip”) < Pre {I : a'; a”) stands for {p < Pre a) A {p” < Pre a”) 

We may now give types to the primitive record operations. Creation and 
selection are easily dealt with: 

0 : {9Abs} 

{I = ■}: a ^ {I : Pre a; 9Abs} 

-.1 : {I : Pre a; i9Any} -5- a 
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Interestingly, the types of both concatenation operations are unchanged from the 
previous section - at least, syntactically. (For space reasons, we do not repeat 
them here.) A subtle difference lies in the fact that all variables involved must 
now be read as row variables, rather than as type variables. In short, the previous 
section exhibited constraints which describe concatenation, at the level of a single 
record field; here, the row machinery allows us to replicate these constraints over 
an infinite set of labels. This increase in power comes almost for free: it does not 
add any complexity to our notion of subtyping. 

5 Dynamic Messages 

So-called “dynamic” messages have recently received new attention in the static 
typing community. Bugliesi and Crafa [3] propose a higher-order type system 
which accounts for first-class messages. Nishimura [11] tackles the issue of type 
inference and suggests a second-order system a la Ohori [13]. Muller and Nishi- 
mura [10] propose a simplified approach, based on an extended feature logic. 

The problem consists in performing type inference for an object-oriented lan- 
guage where messages are first-class values, made up of a label and a parameter. 
Here, we view objects as records of functions, and messages as tagged values. 
(Better ways of modeling objects exist, but that is an independent issue.) Thus, 
we consider a language with records and data constructors, as described in Sec- 
tions 3.2 and 4.2. Furthermore, we let record labels and data constructors range 
over a single name space, that of message labels. (To save space, we choose to deal 
directly with the case of multiple message labels; however, our usual, two-step 
presentation would still be possible.) Lastly, we define a primitive message-send 
operation, written whose semantics is as follows: 

# {m = wi; . . . } (mu 2 ) reduces to {V 1 V 2 ) 

In plain words, ^ examines its second argument, which must be some message 
m with parameter V 2 - It then looks up the method named m in the receiver 
object, and applies the method’s code, Vi, to the message parameter. 

In a language with “static” messages, a message-send operation may only 
involve a constant message label. So, instead of a single message-send operation, 
a family thereof, indexed by message labels, is provided. In fact, in our simple 
model, these operations are definable within the language. The operation 
which allows sending the message m to some object o with parameter p, may be 
defined as Xo.\p.{o.m p). Then, type inference yields 

#m : {m : Pre (a — > /3); 9Any} a ^ (3 

Because the message label, to, is statically known, it may be explicitly mentioned 
in the type scheme, making it easy to require the receiver object to carry an 
appropriate method. In a language with “dynamic” messages, on the other hand, 
TO is no longer known. The problem thus appears more complex; it has, in fact, 
sparked the development of special-purpose constraint languages [10]. Yet, the 
machinery introduced so far in this paper suffices to solve it. 
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Consider the partial application of the message send primitive ^ to some 
record r. It is a function which accepts some tagged value (mv), then invokes an 
appropriate piece of code, selected according to the label m. This should ring a 
bell - it is merely a form of pattern matching, which this paper has extensively 
discussed already. Therefore, we propose 

[V'] P 

where ip < Pre a 

Pre < tpl ip < Pre (a — >■ dP) 

(Here, all variables except /? are row variables.) The operation’s first (resp. se- 
cond) argument is required to be an object (resp. a message), whose contents 
(resp. possible values) are described by the row variable ip (resp. rp). The first 
constraint merely lets a stand for the message parameter’s type. The conditional 
constraint, which involves two row terms, should again be understood as a family, 
indexed by message labels, of conditional constraints between record field types. 
The conditional constraint associated with some label m shall be triggered only 
if p’s element at index m is of the form Pre _, i.e. only if the message’s label may 
be m. When it is triggered, its right-hand side becomes active, with a three-fold 
effect. First, <^’s element at index m must be of the form Pre (_ — >■ _), i.e. the 
receiver object must carry a method labeled m. Second, the method’s argument 
type must be (a supertype of) a’s element at label m, i.e. the method must be 
able to accept the message’s parameter. Third, the method’s result type must 
be (a subtype of) /3, i.e. the whole operation’s result type must be (at least) the 
join of all potentially invoked methods’ return types. 

Our proposal shows that type inference for “dynamic” messages requires 
no dedicated theoretical machinery. It also shows that “dynamic” messages are 
naturally compatible with all operations on records, including concatenation - 
a question which was left unanswered by Nishimura [11]. 

6 Conclusion 

In this paper, we have advocated enriching an existing constraint-based type 
inference framework [15] with rows [18] and conditional constraints [2]. This 
provides a single (and simple) solution to several difficult type inference pro- 
blems, each of which seemed to require, until now, special forms of constraints. 
From a practical point of view, it allows them to benefit from known constraint 
simplification techniques [17], leading to an efficient inference algorithm [14]. 

We believe our system subsumes Remy’s proposal for record concatena- 
tion [20], as well as Muller and Nishimura’s view of “dynamic” messages [10]. 
Aiken, Wimmers and Lakshman’s “soft” type system [2] is more precise than 
ours, because it interprets constraints in a richer logical model, but otherwise 
offers similar features. In fact, the ideas developed in this paper could have been 
presented in the setting of Bane [5], or, more generally, of any system which 
allows writing sufficiently expressive constrained type schemes. 
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A Rules 

This appendix gives a short description of the system’s type inference rules 
(Figure 2). Even though only the core language is explicitly treated, these rules 
are sufficient to deal with a full-featured programming language. Indeed, any 
extra language construct may be viewed either as syntactic sugar, or as a new 
primitive operation, which can be bound in an initial typing environment /q. 
Also, note that these type inference rules use neither conditional constraints, nor 
rows; these will come only from /q. 

For simplicity, we distinguish identifiers bound by A, denoted x,y,. . . from 
those bound by let, denoted X,Y, . . . Furthermore, we expect A-identifiers to 
be unique; that is, each A-identifier must be bound at most once in a given 
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a fresh 

r \~i X : V 0 . (x : a) a 


(VARi) 


r hi e ; VC. A ^ r' A{x) = r 

r hi Xx.e : VC. {A\x) ^ T ^ T 


(AbSi) 


F hi ei : VCi. Ai => Ti F hi 62 : VC2. A2 T2 

a fresh C = Ci U C2 U {n < t2 — >■ a} 


(ApPi) 


F hi 6 i 62 : VC. (Ai n A2) => a 


r{X) = a p fresh renaming of a 

F hi A : p{a) 


(LetVARi) 


r \~i ei : ( 7 i P [X 1-^ ai] hi 62 • cf2 

r hi let A = 61 in 62 : U2 


(LeTi) 



Fig. 2. Type inference rules 



program. Lastly, in every expression of the form let X = ei in 62 , we require 
X to appear free within € 2 - It would be easy to overcome these restrictions, at 
the expense of heavier notation. 

The rules are fairly straightforward. The main point of interest is the way 
each application node produces a subtyping constraint. The only peculiarity is 
in the way type environments are dealt with. The environment F, which appears 
on the left of the turnstile, is a list of bindings of the form X : a. Type schemes 
are slightly more complex than initially shown in Section 2. They are, in fact, 
of the form cr ::= \/C.A t, where the context A is a set of bindings of the 
form X : T. The point of such a formulation is to obtain a system where no type 
scheme has free type variables. This allows a simpler theoretical description of 
constraint simplification. 

As far as notation is concerned, {x : a) represents a context consisting of a 
single entry, which binds cc to a. A \ x is the context obtained by removing x’s 
binding from A, if it exists. For the sake of readability, we have abused notation 
slightly. In rule (AbSi), A{x) stands for the type associated with x in A, if A 
contains a binding for x; it stands for T otherwise. In rule (ApPj), Ai □ A 2 
represents the point-wise intersection of Ai and A 2 . That is, whenever x has 
a binding in Ai or A 2 , its binding in Ai □ A 2 is Ai(x) □ A 2 {x). Because we 
do not have intersection types, this expression should in fact be understood as 
a fresh type variable, accompanied by an appropriate conjunction of subtyping 
constraints. 

The rules implicitly require every constraint set to admit at least one solution. 
Constraint solving and simplification are described in [15, 17]. 
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B Examples 

Example 1. We define a function which reads field I out of a record r, returning a 
default value d if r has no such field, by setting extract = Xd.Xr.{{l = d}@r).l. 
In our system, extract’s inferred type is 

extract : a —>■{/: (p; ip} ^ j 

where ip < Either [3 ip < Either e 

Abs < ip3 a < j Abs < ip? Abs < Any 

Pre < ip? P < J Pre < ip? Pre e < Any 

The first constraint retrieves r.l’s type and names it /3, regardless of the field’s 
presence. (If the field turns out to be absent, /? will be unconstrained.) The left- 
hand conditional constraints clearly specify the dependency between the field’s 
presence and the function’s result. 

The right-hand conditional constraints have tautologous conclusions - the- 
refore, they are superfluous. They remain only because our current constraint 
simplification algorithms are “lazy” and ignore any conditional constraints whose 
condition has not yet been fulfilled. This problem could be fixed by implementing 
slightly more aggressive simplification algorithms. 

The type inferred for extract 0 {^ = 1} and extract 0 {m = 1} is int. Thus, 
in many cases, one need not be aware of the complexity hidden in extract’s type. 

Example 2. We assume given an object o, of the following type: 

o : { getText : Pre (unit string); setText : Pre (string — > unit); 
select : Pre (int x int — > unit); i9Abs } 

o may represent, for instance, an editable text field in a graphic user interface 
system. Its methods allow programmatically getting and setting its contents, as 
well as selecting a portion of text. 

Next, we assume a list data structure, equipped with a simple iterator: 
iter : (a — >■ unit) — >■ a list — > unit 

The following expression creates a list of messages, and uses iter to send each of 
them in turn to o: 

iter {pp o) [ setText “Hello!” ; select (0,5)] 

This expression is well-typed, because o contains appropriate methods to deal 
with each of these messages, and because these methods return unit, as expected 
by iter. The expression’s type is of course unit, iter's return type. 

Here is a similar expression, which involves a getText message: 

iter (ppo) [ setText “Hello!” ; getText () ] 

This time, it is ill-typed. Indeed, sending a setText message to o produces a 
result of type unit, while sending it a getText message produces a result of type 
string. Thus, (# o)’s result type must be T, the join of these types. This makes 
{PP o) an unacceptable argument for iter, since the latter expects a function 
whose return type is unit. 
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Abstract. Standard ML is a statically typed programming language 
that is suited for the construction of both small and large programs. 
“Programming in the small” is captured by Standard ML’s Core langu- 
age. “Programming in the large” is captnred by Standard ML’s Modules 
language that provides constructs for organising related Core language 
definitions into self-contained modules with descriptive interfaces. While 
the Core is used to express details of algorithms and data structnres, Mo- 
dnles is used to express the overall architecture of a software system. The 
Modules and Core languages are stratified in the sense that modules may 
not be manipnlated as ordinary values of the Core. This is a limitation, 
since it means that the architecture of a program cannot be reconfigu- 
red according to run-time demands. We propose a novel extension of the 
language that allows modules to be manipulated as first-class values of 
the Core language. The extension greatly extends the expressive power 
of the language and has been shown to be compatible with both Core 
type inference and a separate extension to higher-order modnles. 



1 Introduction 

Standard ML [10] is a high-level programming language that is suited for the 
construction of both small and large programs. 

Standard ML’s general-purpose Core language supports “programming in 
the small” with a rich range of types and computational constructs that includes 
recursive types and functions, control constructs, exceptions and references. 

Standard ML’s special-purpose Modules language supports “programming 
in the large”. Constructed on top of the Core, the Modules language allows 
definitions of identifiers denoting Core language types and terms to be pack- 
aged together into possibly nested structures, whose components are accessed 
by the dot notation. Structures are transparent: by default, the realisation (i.e. 
implementation) of a type component within a structure is evident outside the 
structure. Signatures are used to specify the types of structures, by specifying 
their individual components. A type component may be specified opaquely, per- 
mitting a variety of realisations, or transparently, by equating it with a particular 
Core type. A structure matches a signature if it provides an implementation for 

* This research was completed at the LFCS, Division of Informatics, University of 
Edinbnrgh under EPSRC grant GR/K63795. Thanks to Don Sannella, Healfdene 
Goguen and the anonymous referees. 
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all of the specified components, and, thanks to subtyping, possibly more. A signa- 
ture may be used to opaquely constrain a matching structure. This existentially 
quantifies over the actual realisation of type components that have opaque spe- 
cifications in the signature, effectively hiding their implementation. A functor 
definition defines a polymorphic function mapping structures to structures. A 
functor may be applied to any structure that realises a subtype of the formal 
argument’s type, resulting in a concrete implementation of the functor body. 

Despite the fiexibility of the Modules type system, the notion of computation 
at the Modules level is actually very weak, permitting only functor application, to 
model the linking of structures, and projection, to provide access to a structure’s 
components. Moreover, the stratification between Core and Modules means that 
the stronger computational mechanisms of the Core cannot be exploited in the 
construction of structures. This severe limitation means that the architecture of 
a program cannot be reconfigured according to run-time demands. For instance, 
we cannot dynamically choose between the various back-ends of a cross compiler, 
if those back-ends are implemented as separate structures. 

In this paper, we relax the Core/Modules stratification, allowing structures to 
be manipulated as first-class citizens of the Core language. Our extension allows 
structures to be passed as arguments to Core functions, returned as results of 
Core computations, stored in Core data structures and so on. 

For presentation purposes, we formulate our extension for a representative toy 
language called Mini-SML. The static semantics of Mini-SML is based directly 
on that of Standard ML. Mini-SML includes the essential features of Standard 
ML Modules but, for brevity, only has a simple Core language of explicitly typed, 
monomorphic functions ([16] treats a Standard ML-like Core). Section 2 intro- 
duces the syntax of Mini-SML. Section 3 gives a motivating example to illustrate 
the limitations of the Core/Modules stratification. Section 4 reviews the static 
semantics of Mini-SML. Section 5 defines our extension to first-class structures. 
Section 6 revisits the motivating example to show the utility of our extension. 
Section 7 presents a different example to demonstrate that Mini-SML becomes 
more expressive with our extension. Section 8 discusses our contribution. 



2 The Syntax of Mini-SML 

The type and term syntax of Mini-SML is defined by the grammar in Figures 1 
and 2, where t G Typid, x G Valid, X G Strld, F G Funid and T G Sigid range 
over disjoint sets of type, value, structure, functor and signature identifiers. 

A core type u may be used to define a type identifier or to specify the type of 
a Core value. These are just the types of a simple functional language, extended 
with the projection sp.t of a type component from a structure path. A signature 
body B is a sequential specification of a structure’s components. A type com- 
ponent may be specified transparently, by equating it with a type, or opaquely, 
permitting a variety of realisations. Transparent specifications may be used to 
express type sharing constraints in the usual way. Value and structure compo- 
nents are specified by their type and signature. The specifications in a body 
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Core Types u 



Signature Bodies B 



Signature Expressions S 



t 

u — >■ u' I int 
sp.t 

type t = u; B 
type t; B 
val X : u; B 
structure X : 

£b 

sig B end 

T 



type identifier 
function space, integers 
type projection 
transparent type specification 
opaque type specification 
value specification 
S; B structure specification 

empty body 
encapsulated body 
signature identifier 



Fig. 1. Type Syntax of Mini-SML 



Core Expressions 


e ::= 


X 


value identifier 




1 


Ax : u.e | e e' 


function, application 




1 


i 1 ifzero e then e'else e” integer, zero test 




1 


fix e 


fixpoint of e (recursion) 




1 


sp.x 


value projection 


Structure Paths 


sp ::= 


X 


structure identifier 




1 


sp.x 


structure projection 


Structure Bodies 


b ::= 


type t = u; b 


type definition 




1 


val X = e; b 


value definition 




1 


structure X 


= s;b structure definition 




1 


functor F (X 


: S) = s; b functor definition 




1 


signature T 


= S; b signature definition 




1 


£b 


empty body 


Structure Expressions 


s 


sp 


structure path 




1 


struct b end 


structure body 




1 


F(s) 


functor application 




1 


s :> S 


opaque constraint 



Fig. 2. Term Syntax of Mini-SML 



are dependent in that subsequent specifications may refer to previous ones. A 
signature expression S encapsulates a body, or is a reference to a bound sig- 
nature identifier. A structure matches a signature expression if it provides an 
implementation for all of the specified components, and possibly more. 

Core expressions e describe a simple functional language extended with the 
projection of a value identifier from a structure path. A structure path sp is a 
reference to a bound structure identifier, or the projection of one of its sub- 
structures. A structure body b is a dependent sequence of definitions: subsequent 
definitions may refer to previous ones. A type definition abbreviates a type. Va- 
lue and structure definitions bind term identifiers to the values of expressions. A 
functor definition introduces a named function on structures: X is the functor’s 
formal argument, S specifies the argument’s type, and s is the functor’s body 
that may refer to X. The functor may be applied to any argument that matches 
S. A signature definition abbreviates a signature. A structure expression s eva- 
luates to a structure. It may be a path or an encapsulated structure body, whose 
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signature Stream = sig type nat = int ; type state; 

val start: state; 

val next: state state; 

val value: state — >■ nat 

end; 

structure TwoOnwards = struct type nat = int ; type state = nat ; 

val start = 2; 

val next = As : state . succ s; 

val value = As: state. s 

end; 

signature State = Stream; 
structure Start = TwoOnwards : >State ; 
functor Next (S: State) = 

struct type nat = S.nat; type state = S. state; 
val filter = fix Af ilter : state— >-state . 

As: state, if zero mod (S. value s) (S. value S. start) 
then filter (S.next s) else s; 
val start = filter S. start; 
val next = As : state . filter (S.next s) ; 
val value = S. value 

end; 

functor Value (S: State) = struct val value = S. value (S. start) end 
Fig. 3. Using structures to implement streams and a stratified, but useless, Sieve. 



type, value and structure definitions (but not functor or signature definitions) 
become the components of the structure. The application of a functor evaluates 
its body with respect to the value of the actual argument. An opaque constraint 
restricts the visibility of the structure’s components to those specified in the sig- 
nature, which the structure must match, and hides the actual realisations of type 
components with opaque specifications, introducing new abstract types. 

By supporting local functor and signature definitions, structure bodies can 
play the role of Standard Mb’s separate top-level syntax. [18] formalises recursive 
datatypes, local structure definitions and transparent signature constraints. 



3 Motivating Example: The Sieve of Eratosthenes 

We can illustrate the limitations of the Core/Modules stratification of Mini-SML 
(and Standard ML) by attempting to implement the Sieve of Eratosthenes using 
Modules level structures as the fundamental “data structure” . It is a moot point 
that the Sieve can be coded directly in the Core: our aim is to highlight the 
shortcomings of second-class modules. The example is adapted from [12]. 

The Sieve is a well-known algorithm for calculating the infinite list, or stream, 
of prime (natural) numbers. We can represent such a stream as a “process”, defi- 
ned by a specific representation nat of the set of natural numbers, an unspecified 
set state of internal states, a designated initial or start state, a transition function 
taking us from one state to the next state, and a function value returning the 
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natural number associated with each state. Reading the values off the process’s 
sequence of states yields the stream. 

Given a stream s, let sift{s) be the substream of s consisting of those values 
not divisible by the initial value of s. Viewed as a process, the states of sift{s) 
are just the states of s, filtered by the removal of any states whose values are 
divisible by the value of s’s start state. The stream of primes is obtained by 
taking the initial value of each stream in the sequence of streams: 

twoonwards = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . . 

sift(twoonwards) = 3, 5, 7, 9, 11, 

sift{sift{twoonwards)) = 5, 7, 11, 



The Sieve of Eratosthenes represents this construction as the following pro- 
cess. The states of the Sieve are streams. The Sieve’s start state is the stream 
twoonwards . The next state of the Sieve is obtained by sifting the current state. 
The value of each state of the Sieve is the first value of that state viewed as a 
stream. Observe that our description of the Sieve also describes a stream. 

Consider the code in Fig. 3. Given our description of streams as processes, 
it seems natural to use structures matching the signature Stream to implement 
streams: e.g. the structure TwoOnwards implements the stream twoonwards . The 
remaining code constructs an implementation of the Sieve. The states of the Sieve 
are structures matching the signature State (i.e. Stream). The Start state of 
the Sieve is the structure TwoOnwards. The functor Next takes a structure S 
matching State and returns a sifted structure that also matches State. The 
functor Value returns the value of a state of the Sieve, by returning the initial 
value of the state viewed as a stream. 

Now we can indeed calculate the value of the prime (counting from 0) : 

structure NthValue = Value( Next(- ■ -Next(Start)- ■ ■) ) ; 
val nthprime = NthValue .value 

by chaining n applications (underlined above) of the functor Next to Start and 
then extracting the resulting value. The problem is that we can only do this for 
a fixed n: because of the stratification of Gore and Modules, it is impossible to 
implement the mathematical function that returns the nth state of the Sieve 
for an arbitrary n. It cannot be implemented as a Gore function, even though 
the Gore supports iteration, because the states of the Sieve are structures that 
do not belong to the Gore language. It cannot be implemented as a Modules 
functor, because the computation on structures is limited to functor application 
and projection, which is too weak to express iteration. This means that our 
implementation of the Sieve is useless. 

Notice also that, in this implementation, the components of the Sieve do not 
describe a stream in the sense of the signature Streami: the states of the Sieve 
are structures, not values of the Gore and the state transition function is a func- 
tor, not a Gore function. Our implementation fails to capture the impredicative 
description of the Sieve as a stream constructed from streams. 
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a 

P,Q 

u 

‘P 

S 

C 

A- 

r 



c 



€ Var = {a,/3,5,7,...} 

Aaf 

G Var Set = Fin( Var) 

G Type a | m — >■ n' | int 
G Real Var ^ Type 



type variables 
sets of type variables 
type variable, function space, integers 

realisations 



/-> . def 

G Str = 



[ 5tU 


St G Typid H Type, ' 


< 5xU 


5x G Valid ^ Type, > 


1 5x 


5x G Strld ^ Str 


AP.S 




= 3P.S 





G Fun ::= VP .5 A" 



/ 

Ct u 


Ct G Typid ^ Type, 


Ct u 


Ct G Sigid 5? Sig, 


CxU 


Cx G Valid ^ Type, 


Cx u 


Cx G Strld ^ Str, 


Cf 


fin 


< 


Cf G Funid “ Fun 



semantic structures 

semantic signatures 
existential structures 
semantic functors 



semantic contexts 



Notation. For sets A and B, Fin(A) denotes the set of finite subsets of A, and A B 
denotes the set of finite maps from A to B. Let / and g be finite maps. T>{f) denotes 
the domain of definition of /. The finite map f + g has domain T>{f) U T>{g) and 

Hof 

values (/ -I- g){a) = if a G T>{g) then g{a) else /(a). 



Fig. 4. Semantic Objects of Mini-SML 



Once we allow structures as first-class citizens of the Core language, these 
problems disappear. 



4 Review: The Static Semantics of Mini-SML 

Before we can propose our extension, we need to present the static semantics, 
or typing judgements, of Mini-SML. Following Standard ML [10], the static 
semantics of Mini-SML distinguishes syntactic types of the language from their 
semantic counterparts, called semantic objects. Semantic objects play the role of 
types in the semantics. Figure 4 defines the semantic objects of Mini-SML. We 
let O range over all semantic objects. 

Type variables a G Var are just variables ranging over semantic types u G 
Type. The latter are the semantic counterparts of syntactic Core types, and 
are used to record the denotations of type identifiers and the types of value 
identifiers. The symbols A, 3 and V are used to bind finite sets of type variables. 

A realisation ip G Real maps type variables to semantic types and defines 
a substitution on type variables in the usual way. The operation of applying a 
realisation (p to an object O is written p {O). 

Semantic structures S G Str are used as the types of structure identifiers and 
paths. A semantic structure maps type components to the types they denote. 




342 



C.V. Russo 



and value and structure components to the types they inhabit. For clarity, we 
define the extension functions t c> m, 5 {t i— m} + 5, x : u,S '= {x i— >■ m} + 5, 

and X : 5, 5' {X >->■ 5} + 5', and let es denote the empty structure 0. 

A semantic signature AP.S is a parameterised type: it describes the family 
of structures ip (5), for p a realisation of the parameters in P . 

The existential structure 3P .S, on the other hand, is a quantified type: varia- 
bles in P are existentially quantified in S and thus abstract. Existential structu- 
res describe the types of structure bodies and expression. Existentially quantified 
type variables are explicitly introduced by opaque constraints s :> S, and impli- 
citly eliminated at various points in the static semantics. 

A semantic functor \/P.S — >■ X describes the type of a functor identifier: the 
universally quantified variables in P are bound simultaneously in the functor’s 
domain, S, and its range, X. These variables capture the type components of the 
domain on which the functor behaves polymorphically; their possible occurrence 
in the range caters for the propagation of type identities from the functor’s actual 
argument: functors are polymorphic functions on structures. 

A context C maps type and signature identifiers to the types and signatures 
they denote, and maps value, structure and functor identifiers to the types they 

def 

inhabit. For clarity, we define the extension functions C,ti>M = C + {t ^ u}, 
T ^ {T ^ X : y {x ^ y}, X : 5 C -f {x S}, and 

C, F : =*' C -f {F .7^}. 

We let V{0) denote the set of variables occurring free in O, where the notions 
of free and bound variable are defined as usual. Furthermore, we identify seman- 
tic objects that differ only in a renaming of bound type variables (a-conversion) . 

The operation of applying a realisation to a type (substitution) is extended 
to all semantic objects in the usual, capture-avoiding way. 

Definition 1 (Enrichment Relation) Given two structures S and S' , S en- 
riches S', written S > S', if and only ifV{S) A V{S') and 

- for all t G V{S'), 5(t) = 5'(t), 

- for all X G T>{S'), 5(x) = 5'(x), and 

- for all X G V{S'), 5(X) ^ 5'(X). 

Enrichment is a pre-order that defines a subtyping relation on semantic struc- 
tures (i.e. 5 is a subtype of S' if and only if 5 ^ S'). 

Definition 2 (Functor Instantiation) A semantic functor \/P.S — >■ X in- 
stantiates to a functor instance S' — >■ X' , written \/P.S X > S' ^ X' , if and 
only if p (S) = S' and p (X) = X' , for some realisation p with T){p) = P ■ 

Definition 3 (Signature Matching) A semantic structure S matches a sig- 
nature AP.S' if and only if S ^ p{S') for some realisation p with T>{p) = P. 

The static semantics of Mini-SML is defined by the denotation judgements 
in Fig. 5 that relate type phrases to their denotations, and the classification 
judgements in Fig. 6 that relate term phrases to their semantic types. A complete 
presentation and detailed explanation of these rules may be found in [18, 16, 17]. 
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t € T>{C) C\- \i> u C\- u' > u' C h sp : 5 t £ 'D{S) 

Chtl>C(t) Chu-^-u' C h int > int Chsp.t>5(t) 



C h B>£ 



Chu>M C, t > M h B > ylP.5 t^T>{S) PnV(ti) = 0 
C h (type t = u; B) > AP.{t > u, S) 

a^V(C) C,ti>a h BoylP.5 t 0 0(5) a^P 
C h (type t; B) > A{a} U P.(t > a, 5) 



Chui>tt C, X : n h B > ylP.5 x 0 0(5) PnV(«) = 0 
C h (val X : u; B) > AP.(x : u, 5) 



C h S > AP.S P n V(C) = 0 C,X:5hBi> AQ.S' X 0 0(5') Q n (P U V(5)) = 0 
C h (structure X : S; B) > ylP U Q.(X : 5, 5') 



C h SoP 



C h £b > A^.es 

C h BoP T £ 0(C) 

ChsigBend>P ChT>C(T) 



Fig. 5. Denotation Judgements 



X € 0(C) Chu>M C,x:Mhe:M' Chsp:5 x G 0(5) 

C h X : C(x) C h Ax : u.e :«—>■«' C h sp.x : 5(x) 

X G 0(C) C h sp : 5 X G 0(5) 

C h X : C(X) C h sp.x : 5(X) 

Chul>ti C, t > M h b : 3P.5 PnV(ri)=0 
C h (type t = u; b) : 3P.(t > u, 5) 

CI-e:M C,x : M h b : 3P.5 PnV(«) = 0 
C h (val X = e; b) : 3P.(x : u, 5) 

CI-s:3P.5 PnV(C)=0 C,X : 5 h b : 3Q.5' Qn(PuV(5)) = 0 
C h (structure X = s;b) : 3P U Q.(X : 5, S') 



C h sp : 5 



C h b : A" 



C h s : A" 



CI-S>AP.5 PnV(C) = 0 C,X:5hs:A' 
C, F : VP.5 A" h b : A" 

C h (functor F (X : S) = s; b) : X' 

Cl-S>£ C,Tl>£l-b:A' 

C h (signature T = S; b) : A" C\~ th '■ 30.65 

C l~ sp : 5 C h b : A- 

C h sp : 30.5 C h struct b end : X 



CI-s:3P.5 PnV(C(F)) = 0 C(F) > 5' 3Q.5" 5^5' (3nP = 0 

C V- F(s) : 3P U Q.S" 

CI-s:3P.5 CI-S>AQ.5' PnV(AQ.5') = 0 S>zp{S') V{<p) = Q 

C h (s :> S) : 3(3.5' 



Fig. 6. Classification Judgements (some rules for Core expressions omitted) 
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5 Package Types 

The motivation for introducing first-class structures is to extend the range of 
computations on structures. One way to do this is to extend structure expressi- 
ons, and thus computation at the Modules level, with the general-purpose com- 
putational constructs usually associated with the Core. Instead of complicating 
the Modules language in this way, we propose to maintain the distinction bet- 
ween Core and Modules, but relax the stratification. Our proposal is to extend 
the Core language with a family of Core types, called package types, correspon- 
ding to first-class structures. A package type is introduced by encapsulating, or 
packing, a structure as a Core value. A package type is eliminated by breaking 
an encapsulation, opening a Core value as a structure in the scope of another 
Core expression. Because package types are ordinary Core types, packages are 
first-class citizens of the Core. The introduction and elimination phrases al- 
low computation to alternate between computation at the level of Modules and 
computation at the level of the Core, without having to identify the notions of 
computation. 

Our extension requires just three new syntactic constructs, all of which are 
additions to the Core language: 

Core Types u ::=. . . | <S> package type 

Core Expressions e ::=.. . | pack s as S package introduction 

I open e as X : S in e' package elimination 

The syntactic Core type <S>, which we call a package type, denotes the 
type of a Core expression that evaluates to an encapsulated structure value. The 
actual type of this structure value must match the signature S: i.e. if S denotes 
AP.S, then the type of the encapsulated structure must be a subtype of ip (S), 
for (p a realisation with ^{(p) = P. Two package types <S> and <S'> will be 
equivalent if their denotations (not just their syntactic forms) are equivalent. 

The Core expression pack s as S introduces a value of package type <S>. 
Assuming a call-by-value dynamic semantics, the phrase is evaluated by evalua- 
ting the structure expression s and encapsulating the resulting structure value as 
a Core value. The static semantics needs to ensure that the type of the structure 
expression matches the signature S. Note that two expressions pack s as S and 
pack s' as S may have the same package type <S> even when the actual types 
of s and s' differ (i.e. the types both match the signature, but in different ways). 

The Core expression open e as X : S in e' eliminates a value of package 
type <S>. Assuming a call-by-value dynamic semantics, the expression e is 
evaluated to an encapsulated structure value, this value is bound to the structure 
identifier X, and the value of the entire phrase is obtained by evaluating the client 
expression e' in the extended environment. The static semantics needs to ensure 
that e has the package type <S> and that the type of e' does not vary with the 
actual type of the encapsulated structure X. 

The semantic Core types of Mini-SML must be extended with the semantic 
counterpart of syntactic package types. In Mini-SML, the type of a structure 
expression is an existential structure X determined by the judgement form C \~ 
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s : X. Similarly, the denotation of a package type, which describes the type of 
an encapsulated structure value, is just an encapsulated existential structure: 

u G Type | <X> semantic package type 

We identify semantic package types that are equivalent up to matching: 

Definition 4 (Equivalence of Semantic Package Types) Two semantic 
package types <3P.S> and <3P'.S'> are equivalent if, and only if, 

— P' (1 V{3P.S) = 0 and S' (p (S) for some realisation p with T>{p) = P; 

— P n V(3P'.5') = 0 and S ^ p' (S') for some realisation p' with V{p') = P' ■ 

The following rules extend the Core judgements C h u c> m and Che: u: 

C hS>AP.S 

Ch <S>x3P.S> (1) 

Rule 1 relates a syntactic package type to its denotation as a semantic package 
type. The parameters of the semantic signature AP.S stem from opaque type 
specifications in S and determine the quantifier of the package type <3P ,S>. 



Chs:3P.5 ChS>T(5.5' P f\V{AQ.S') = % S^p{S') V{p) = Q 



C h (pack s as S) : <3<5.5'> 

( 2 ) 

Rule 2 is the introduction rule for package types. Provided s has existential 
type 3P.S and S denotes the semantic signature AQ.S', the existential quan- 
tification over P is eliminated in order to verify that S matches the signature. 
The side condition P hi V{AQ.S') = 0 prevents the capture of free variables in 
the signature by the bound variables in P and ensures that these variables are 
treated as hypothetical types. The semantic signature AQ.S' describes a family 
of semantic structures and the requirement is that the type S of the structure 
expression enriches, i.e. is a subtype of, some member p (S') of this family. In 
the resulting package type <3Q.S'>, the existential quantification over Q hi- 
des the actual realisation, rendering type components specified opaquely in S 
abstract. Because the rule merely requires that 5 is a subtype of p{S'), the 
package pack s as S may have fewer components than the actual structure s. 



Che:<3P.5> C h S > TP.5 P n V(C) = 0 C, X : 5 h e' : m P n V(m) = 0 



C h (open e as X : S in e') : M 

(3) 

Rule 3 is the elimination rule for package types. Provided e has package 
type <3P.iS>, where this type is determined by the denotation of the explicit 
syntactic signature S, the client e' of the package is classified in the extended 
context C,X : S. The side-condition P fl V(C) = 0 prevents the capture of free 
variables in C by the bound variables in P and ensures that these variables are 
treated as hypothetical types for the classification of e'. By requiring that e' is 
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structure Sieve = 

struct type nat = TwoOnwards . nat ; type state = <Stream>; 
val start = pack TwoOnwards as Stream; 

val next = As : state . open s as S: Stream in pack Next(S) as Stream; 
val value = As : state . open s as S: Stream in S. value S. start 

end; 

val nthstate = fix Anthstate : int->Sieve . state . 

An: int . ifzero n then Sieve. start 

else Sieve. next (nthstate (pred n) ) ; 
val nthprime = An: int . Sieve .value (nthstate n) ; 

Fig. 7. The Sieve implemented using package types. 



polymorphic in P, the actual realisation of these hypothetical types is allowed to 
vary with the value of e. Moreover, because 5 is a generic structure matching the 
signature S, the rule ensures that e' does not access any components of X that are 
not specified in S: thus the existence of any unspecified components is allowed to 
vary with the actual value of e. Finally, the side condition P fl V(m) = 0 prevents 
any variation in the actual realisation of P from affecting the type of the phrase. 

Observe that the explicit signature S in the term open e as X : S in e' 
uniquely determines the Core type of the expression e. This is significant for an 
implicitly typed language like Standard ML’s Core: the explicit signature ensures 
that the type inference problem for that Core remains tractable and has principal 
solutions. Intuitively, the type inference algorithm [16] never has to guess the 
type of an expression that is used as a package. The explicit signature in the term 
pack s as S ensures that the package type of the expression corresponds to a 
well- formed signature (this may not be the case for the actual type of s): testing 
the equivalence of such well-formed package types (even modulo unification) can 
be performed by two appeals to a signature matching algorithm [16]. 

Rules 2 and 3 are closely related to the standard rules for second-order exi- 
stential types in Type Theory [12]. The main difference, aside from manipulating 
n-ary, not just unary, quantifiers is that these rules also mediate between the 
universe of Module types and the universe of Core types. [16, 18] sketch proofs 
of type soundness for package types; [18] discusses implementation issues. 

6 The Sieve Revisited 

The addition of package types allows us to define the structure Sieve imple- 
menting the Sieve of Eratosthenes (Fig. 7). The Core type Sieve, state is the 
type of packaged streams <Strecun>. The Core value Sieve, start is the packa- 
ged stream TwoOnwards. The Core function Sieve. next returns the next state 
of Sieve by opening the supplied state, sifting the encapsulated stream, and 
packaging the resulting stream as a Core value. The Core function Sieve .value 
returns the first value of its encapsulated stream argument. 

It is easy to verify that Sieve has type: 

30.(nat> int, stated u, start: u, next: u ^ u, value: u — >■ int), 
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wherert = <3{a}.(natl> int, stateo a, start: a, next: a — ^ a, value: a — >■ int)> 
is the type of packed streams. 

Sieve elegantly captures the impredicative description of the Sieve as a 
stream constructed from streams: its type also matches Stream, since 

(natl> int, statet> u, start: u, next: u ^ u, value: u — >■ int) ^ 

{a !-->■ m} (nat> int, stated a, start: a, next: a — >■ a, value: a — >■ int). 

Sieve is a useful implementation because it allows us to define the functions 
nthstate and nthprime of Fig. 7. Since the states of Sieve are just ordinary 
Core values, which happen to have package types, the function nthstate n can 
use recursion on n to construct the state of Sieve. In turn, this permits the 
function nthprime n to calculate the prime, for an arbitrary n. Recall that, 
in the absence of package types, these functions could not be defined using the 
implementation of the Sieve we gave in Section 3. 

7 Another Example: Dynamically-Sized Arrays 

Package types permit the actual realisation of an abstract type to depend on the 
result of a Core computation. For this reason, package types strictly extend the 
class of abstract types that can be defined in vanilla Mini-SML. 

A familiar example of such a type is the type of dynamically allocated arrays 
of size n, where n is a value that is computed at run-time. For simplicity, we 
implement functional arrays of size 2”, for arbitrary n > 0 (Fig. 8). 

The signature Array specifies structures implementing integer arrays with 
the following interpretation. For a fixed n, the type array represents arrays 
containing 2" entries of type entry (equivalent to int). The function init e 
creates an array that has its entries initialised to the value of e. The function 
sub a i returns the value of the {i mod 2”)-th entry of the array a. The function 
update a i e returns an array that is equivalent to the array a, except for the 
(i mod 2")-th entry that is updated with the value of e. Interpreting each index 
i modulo 2" allows us to omit array bound checks. 

The structure ArrayZero implements arrays of size 2*^ = 1. An array is 
represented by its sole entry with trivial init, sub and update functions. 

The functor ArraySucc maps a structure A, implementing arrays of size 2”, 
to a structure implementing arrays of size 2”“''^. The functor represents an array 
of size 2”+^ as a pair of arrays of size 2”. Entries with even (odd) indices are 
stored in the first (second) component of the pair. The function init e returns 
a pair of initialised arrays of size 2". The function sub a i (update a i e) uses 
the parity of i to determine which subarray to subscript (update) . 

The Core function mkArray n uses recursion on n to construct a package 
implementing arrays of size 2”. Notice that the actual realisation of the abstract 
type array returned by mkArray n is a balanced, nested cross product of depth 
n: the shape of this type depends on the run-time value of n. Interestingly, this is 
an example of “data-structural bootstrapping” [14] yet does not use non-regular 
recursive types or polymorphic recursion: it does not even use recursive types! 
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signature Array = sig type entry = int; type array; ('* array is opaque*) 
val init : entry array; 
val sub: array int — >■ entry; 
val update: array — >■ int — entry — >■ array 

end; 

structure ArrayZero = struct type entry = int; type array = entry; 

val init = Ae : entry . e ; 

val sub = Aa: array . Ai : int . a; 

val update = Aa: array . Ai : int . Ae : entry . e 

end; 

functor ArraySucc (A: Array) = 

struct type entry = A. entry; type array = A. array * A. array; 
val init = Ae: entry. (A. init e, A. init e) 
val sub = Aa : array . Ai : int . 

if zero mod i 2 then A. sub (fst a) (div i 2) 
else A. sub (snd a) (div i 2); 
val update = Aa: array . Ai : int . Ae : entry . 

ifzero mod i 2 then (A. update (fst a) (div i 2) e, snd a) 
else (fst a, A. update (snd a) (div i 2) e) 

end; 

val mkArray = fix AmkArray : int— >-<Array> . 

An: int. ifzero n then pack ArrayZero as Array 

else open mkArray (pred n) as A: Array in 
pack ArraySucc (A) as Array; 

Fig. 8. mkArray n returns an abstract implementation of arrays of size 2". 



8 Contribution 

For presentation purposes, we restricted our attention to an explicitly typed, 
monomorphic Core language and a first-order Modules language. In [16], we 
demonstrate that the extension with package types may also be applied to a 
Standard ML-like Core language that supports the definition of type construc- 
tors and implicitly typed, polymorphic values. For instance. Section 7.3 of [16] 
generalises the example of Section 7 to an implementation of dynamically sized 
polymorphic arrays where array is a unary type constructor taking the type 
of entries as an argument and the array operations are suitably polymorphic. 
Moreover, this extension is formulated with respect to a higher-order Modules 
calculus that allows functors, not just structures, to be treated as first class citi- 
zens of the Modules language and, via package types, the Core language too. This 
proposal is practical: we present a well-behaved algorithm that integrates type 
inference for the extended Core with type checking for higher-order Modules. 
First-class and higher-order modules are available in Moscow ML V2.00[15]. 

Our approach to obtaining first-class structures is novel because it leaves the 
Modules language unchanged, relies on a simple extension of the Core language 
only and avoids introducing subtyping in the Core type system, which would 
otherwise pose severe difficulties for Core-ML type inference. (Although Mit- 
chell et al. [11, 5] first suggested the idea of coercing a structure to a first-class 
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existential type they did not require explicit introduction and elimination terms: 
our insistence on these terms enables Core-ML type inference.) Our work refu- 
tes Harper and Mitchell’s claim [2] that the existing type structure of Standard 
ML cannot accommodate first-class structures without sacrificing the compile- 
time/run-time phase distinction and decidable type checking. This is a limitation 
of their proposed model, which is based on first-order dependent types, but does 
not transfer to the simpler, second-order type theory [17] of Standard ML. 

Our motivation for introducing first-class structures was to extend the range 
of computations on structures. One way to achieve this is to extend structure 
expressions directly with computational constructs usually associated with the 
Core. Taken to the extreme, this approach relaxes the stratification between 
Modules and the Core by removing the distinction between them, amalgamating 
both in a single language. This is the route taken by Harper and Lillibridge [1, 8]. 
Unfortunately, the identification of Core and Modules types renders subtyping, 
and thus type checking, undecidable. Leroy [6] briefly considers this approach 
without formalising it but observes that the resulting interaction between Core 
and Modules computation violates the type soundness of applicative functors 
[7]. Odersky and Laufer [13] and Jones [4] adopt a different tack and extend 
implicitly typed Core-ML with impredicative type quantification and higher- 
order type constructors that can model some, but not all, of the features of 
Standard ML Modules while providing first-class and higher-order modules. 

Our approach is different. We maintain the distinction between Core and 
Modules, but relax the stratification by extending the Core language with pack- 
age types. The introduction and elimination phrases for package types allow 
computation to alternate between computation at the level of Modules and 
computation at the level of the Core, without having to identify the notions 
of computation. This is reflected in the type system in which the Modules and 
Core typing relations are distinct. This is a significant advantage for implicitly 
typed Core languages like Core-ML. At the Modules level, the explicitly typed 
nature of Modules makes it possible to accommodate subtyping, functors with 
polymorphic arguments and true type constructors in the type checker for the 
typing relation. At the Core-ML level, the absence of subtyping, the restric- 
tion that ML functions may only take monomorphic arguments and that ML 
type variables range over types (but not type constructors) permits the use of 
Hindley-Milner [3, 9] type inference. In comparison, the amalgamated languages 
of [1, 8] support type constructors and subtyping, but at the cost of an expli- 
citly typed Core fragment; [13, 4] support partial type inference, but do not 
provide subtyping on structures, type components in structures or a full treat- 
ment of type constructors, whose expansion and contraction must be mediated 
by explicit Core terms instead of implicit /3-conversion. 

Although not illustrated here, the advantage of distinguishing between Mo- 
dules computation and Core computation is that they can be designed to satisfy 
different invariants [16]. For instance, the invariant needed to support applicative 
functors [7, 16], namely that the abstract types returned by a functor depend 
only on its type arguments and not the value of its term argument, is violated 
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if we extend Modules computation directly with general-purpose computational 
constructs. Applicative functors provide better support for programming with 
higher-order Modules; general-purpose constructs are vital for a useful Core. In 
[16], we show that maintaining the separation between Modules and Core com- 
putation accommodates both applicative functors and a general-purpose Core, 
without violating type soundness. Type soundness is preserved by the addition 
of package types, because these merely extend the computational power of the 
Core, not Modules (package elimination is weaker than including Core expressi- 
ons in Module expressions). The languages of [1, 8] have higher-order functors, 
but their single notion of computation implies a trade-off between supporting 
either applicative functors or general-purpose computation. Since ruling out the 
latter is too restrictive, the functors of these calculi are not applicative. 
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Abstract. We provide a uniform framework for the analysis of programs 
with procedures and explicit, unbounded, fork/join parallelism covering 
not only bitvector problems like reaching definitions or live variables 
but also non-bitvector problems like simple constant propagation. Due 
to their structural similarity to the sequential case, the resulting algo- 
rithms are as efficient as their widely accepted sequential counterparts, 
and they can easily be integrated in existing program analysis environ- 
ments like e.g. MetaFrame or PAG. We are therefore convinced that 
our method will soon find its way into industrial-scale computer systems. 

Keywords: Inter-procedural program analysis, explicit parallelism, bit- 
vector problems, simple constant propagation, coincidence theorems. 



1 Introduction 

The analysis of parallel programs is known as a notoriously hard problem. Even 
without procedures and with only bounded parallelism the analysis typically suf- 
fers from the so-called state explosion problem: in general, already the required 
control structures grow exponentially with the number of parallel components. 
Bitvector analyses, dominant in most practical compilers, escape this problem 
in the context of fork/join-parallelism [11, 9]: a simple pre-process is sufficient to 
adapt sequential intra-procedural bitvector analyses to directly work on parallel 
flow graphs which concisely and explicitly represent the program’s parallelism. 
Key for this adaptation was to change from a property analysis (directly associa- 
ting program points with properties) to an effect analysis^ associating program 
points with a property transformer resembling the effect of the ‘preceding’ pro- 
gram fragment. The simplicity of the adaption results from the fact that bitvector 
analyses can conceptually be “sliced” into separate analyses for each individual 
bit-component each of which only requires the consideration of a three-point 
transformer domain. 

In order to handle also procedures and unbounded parallelism, Esparza and 
Knoop observed that the described problem profile also admits an automata 

^ Second-order analysis in the terminology of [11]. 
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theoretic treatment [5] . This observation has been carefully developed by Esparza 
and Podelski in [6]. The resulting algorithm requires involved concepts, like e.g. 
tree automata, and, from a program analyzer’s perspective, the results are rather 
indirect: the reachability analysis computes regular sets characterizing the set of 
states satisfying a particular property. More precisely, the algorithm treats each 
bit-component of the analysis separately. For each such component an automata 
construction is required which is linear in the product of the size of the program 
and the size of an automaton describing reachable configurations. The latter 
automaton can grow linearly in the size of the program as well ~ implying that 
the analysis of each component is at least quadratic in the program size. 

In this paper we present a much more direct framework for the inter-procedural 
analysis of fork/join parallel programs. We propose a constraint-based approach 
which naturally arises from an algebraic reformulation of the intra-procedural 
method presented in [11, 9]. Our approach closely resembles the classical under- 
standing of bitvector analysis, has a complexity which is linear in the program 
size and admits elegant, algebraic proofs. Summarizing, we contribute to the 
state of the art by 

1. Providing a uniform characterization of the captured analysis profile which 
simultaneously addresses all involved program entities, e.g., all program va- 
riables at once for live variable analysis or all program expressions at once for 
availability of expressions. Moreover, this profile goes beyond pure bitvector 
analyses as it e.g. also captures simple constant propagation [9]. 

2. Basing our development on a constraint characterization of valid parallel 
execution paths: the constraint system for the actual analyses simply results 
from an abstract interpretation [3, 4, 2] of this characterization. 

3. Presenting a framework which supports algebraic reasoning. E.g., the proof 
for proposition 2(3) - resembling the central Main Lemma of [11] - straight- 
forwardly evolves from our profile characterization. 

4. Guaranteeing essentially the same performance as for purely inter-procedural 
bitvector analyses by exploiting the results of a generalized possible interfe- 
rence analysis [11]. 

As a consequence, the presented framework is tightly tailored for the intended 
application area. It directly associates the program points with the required 
information based on classical constraint solving through (e.g., worklist based) 
fixpoint iteration. This can be exploited to obtain simple implementations in 
current program analysis generators like DFA& OPT MetaFrame [10] or PAG 
[1], which provide all the required fixpoint iteration machinery. 

The paper is organized as follows. After formally introducing explicitly parallel 
programs with procedures in section 2, we define the notion of parallel execution 
paths in section 3, and specify our analysis problem in section 4. Section 5 
then presents a precise effect analysis for procedures, which is the basis for the 
precise inter-procedural reachability analysis given in section 6. Finally, section 
7 discusses possible extensions of our formal development, while section 8 gives 
our conclusions and perspectives. 
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2 Programs as Control-Flow Graphs 

We assume that programs are given as (annotated) control-flow graphs (cfg’s for 
short). An edge in the cfg either is a call of a single procedure, a parallel call to 
two procedures, or a basic computation step. An example of such a cfg is given in 
figure 1. There, we only visualized the annotation of call and parallel call edges. 
Observe that this cfg indeed introduces an unbounded number of instances of 
procedure q running in parallel. 




Fig. 1. An Example Control-flow Graph. 



Formally, a control-flow graph Q for a program with procedures and explicit 
parallelism consists of a finite set Proc of procedures together with a collection 
Gp,P G Proc, of disjoint intra-procedural control-flow graphs. We assume that 
there is one special procedure main with which program execution starts. The 
intra-procedural control-flow graph Gp of a procedure p consists of: 

~ A set Np of program points; 

— A special entry point s G Np as well as a special return point r G Np\ 

— A set of edges Ep C Np x Np; 

— A subset Cp C Ep of call edges where for e G Cp, call e = p denotes that 
edge e calls the procedure p; and Anally, 

~ A subset Pp C Ep of parallel call edges where for e G Pp, call e = Pi\\p 2 
denotes that edge e calls the procedures pi and p 2 in parallel. 

Edges which are not contained in Cp or Pp are also called basic edges. 

Practical Remark: It is just for convenience that we allow only binary par- 
allelism in our programs. Our methods can be easily adapted to work also for 
more procedures being called in parallel or even parallel do-loops. 

Also note that we do not consider synchronization between parallel threads by 
barriers or semaphores. Such constructs limit the amount of possible execution 
paths. By ignoring these, we may get more possible execution paths and thus 
(perhaps less precise but) still safe analysis results. 



3 Parallel Execution Paths 

The semantics of a parallel program is determined w.r.t. the set of parallel exe- 
cution paths. What we are now going to formalize is an interleaving semantics 
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for parallely executable threads. We need the following auxiliary definitions. 
Let E denote a finite set of edges. Let w = Ci . . . e„ be a word from E* and 
I = {ii < ... < ik} C n} be a subset of positions in w. Then the 

restriction of w to I is given by w\i = . . . Ci^. 

The interleaving of subsets Mi , M 2 C E* is defined by 

Ml (g) M 2 = {w G E* \ 3Ii + I 2 = {I, . . . , |tu|} : G Mi and w\i^ G M 2 } 

Here, “+” denotes disjoint union of sets. Thus, Mi 0 M 2 consists of all possible 
interleavings of sequences from Mi and M 2 . Furthermore for MCE*, let pre(M) 
denote the set of all prefixes of words in M, i.e., 

pre(M) = {u G E* \ 3v G E* : uv G M} 

We consider the following sets of possible execution paths: 

— For p G Proc, the set n{p) of all execution paths for p; 

— For program point v of procedure p, the set II (v) of all paths starting at the 
entry point of p and reaching v on the same level (see below); 

— For every procedure p, the set IIr{p) of all paths starting at from a call of 
main and reaching some call of p; 

— For every program point v, the set IIr(v) of all paths starting at from a call 
of main and reaching program point v. 

These sets are given through the least solutions of the following constraint sy- 
stems (whose variables for simplicity are denoted by II{p),n{v),IIr{p),IIr{v) as 
well). Let us start with the defining constraint system for the sets of same-level 
execution paths. 



n{p) c n{r) 


r return point of p 


(1) 


n{s) 2 {e} 


s entry point of a procedure 


(2) 


n{v) 2 n{u) ■ {e} 


e = (u, v) basic edge 


(3) 


n\v) 2 n\u) ■ n{p) 


e = (u, v) calls p 


(4) 


n\v) 2 n\u) ■ (n{pi) o n{p 2 )) 


e = {u, v) calls pi \ \p 2 


(5) 



Lines (1) through (4) are the standard lines to determine the sets of all same-level 
execution paths as known from inter-procedural analysis of sequential programs. 
Line (1) says that the set of execution paths of procedure p is the set of same- level 
paths reaching the return point of p. Line (2) says that at least e is a same-level 
execution path that reaches the entry point of a procedure. Line (3) says that 
for every basic edge e = (u,v), the set of same-level execution paths reaching 
the program point v subsumes all same-level execution paths to u extended by 
e. Line (4) says that for every edge e = (u, v) calling a procedure p, the set 
of same-level execution paths reaching the program point v subsumes all same- 
level execution paths reaching u extended by any execution path through the 
procedure p. Line (5) for a parallel call of pi \ \p 2 has the same form as line (4). 
But now the same-level execution paths to the program point before the call are 
extended by all interleavings of execution paths for pi and p 2 . 
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In order to specify the sets IIr{p),nr{v), let us introduce the auxiliary sets 
n{v,p), V a program point, p a procedure, which give the sets of execution 
paths reaching v from a call of p. These auxiliary sets are defined as the least 
solution of the following system of constraints: 

n{v,q) D n{v) V program point of procedure q (1) 

n{v,q) ^ n{u) ■ n{v,p) e = (u, _) calls p in g (2) 

n\v,q) ^ n\u) ■ {n{v,Pi) ® M) e = (u, _) calls Pi 1 1 P 2 in g (3) 

where M in line (3) is given by M = pre(il(p 3 _i)). The intuition behind this 
definition is as follows. Line (1) says that whenever v is a program point of 
procedure q, then the set of execution paths from q to v subsumes all same- 
level execution paths from q to v. Line (2) says that whenever at some edge 
e = (m, _) in the body of procedure g, some procedure p is called, then the 
set of execution paths from q to v subsumes all computation paths consisting 
of a same-level execution path from g to the program point u followed by an 
execution path from p to v. Finally, line (3) considers an edge e = (m, _) in the 
body of g which is a parallel call of pi and P 2 - Then we have to append to the 
same-level execution paths to u all interleavings of execution paths from pi to v 
with prefixes of same-level execution paths for the parallel procedure. 

Given the II {v, g), we define the values nr{v),IIr{p) as the least solution of: 

IIr{v) 3 i7(ti,main) v a program point 

IIr{p) 2 IIr{u) edge (u, _) calls p, p 1 1 _ or _ 1 1 p 

For now, let us assume that all the sets of execution paths n{v),nr{v),n{p),IIr{p) 
are non-empty. In section 7 we will explain how this assumption can be removed. 

4 Semantics 

Let ID) denote a complete lattice and F C D — >■ ID a subset of monotonic functions 
from D to ID which contains Xx.l. (the constant T-function) and I = Xx.x (the 
identity) and is closed under composition “o” and least upper bounds. While D 
is meant to specify the set of abstract properties, F describes all possible ways 
how properties may be transformed when passing from one program point to 
the other. In this paper we make the following additional assumption: 

— ID is distributive, i.e., a U (6 □ c) = (a □ 6) U (a □ c) holds for all a,b,c G ID; 

— ID has height h < oo, i.e., every ascending chain of elements in ID has length 
at most h+ 1] 

— set F consists of all functions of the form f x = (a □ a;) U 6 with a, 6 G ID. 

Since ID is distributive, all functions / in F are distributive as well, i.e., / (aU5) = 
(/ a) U (/ 5) for all a, 6 G D. Let us also mention that neither D nor F is demanded 
to be finite. However, since ID has height h, the lattice F has height at most 2h. 
The most prominent class of problems that satisfy our restrictions are bitvector 
problems like available expressions, reaching definitions, life variables or very 
busy expressions [7]. In these cases, we may choose D = where B = {0 C 1}. 
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There are, however, further analysis problems which meet our assumptions with- 
out being bitvector problems. This is the case, e.g., for simple constant propa- 
gation. Simple constant propagation tries to determine whether or not some 
constant has been assigned to a variable which later-on remains unchanged. For 
this application, we may choose D = y — >■ B where V is the set of program 
variables and B is the flat lattice of possible values for program variables. Thus, 
an abstract value d € D represents an assignment of variables to values. In par- 
ticular, D has height h = 2 ■ ifV . Note furthermore that for simple constant 
propagation, all functions / G F are of the special form / = Xx.{a □ a;) U & with 
a G {T,T}. Thus, ascending chains of functions have length at most 3 • #y. 

Let E denote a set of edges and [.] : if — >■ F denote an assignment of functions 
to edges. Then we extend [.] to sequences w = ei ... Cn € E* and sets MCE* 
in the natural way, i.e., by 

M = [e„] o . . . o [ei] [M] = □{[«;] \ w € M} 

Thus, especially, [0] = Ax.T (the least element in F), and [{e}] = [e] = I. 
Functions [w] and [M] are also called the ejfect of the sequence w and the set 
M, respectively. 

For the rest of this paper we assume that we are given an assignment 
[e] = fe = Xx.{Ue n x) U 6e G F 

to each basic edge e of our input program. Then program analysis tries to com- 
pute (approximations to) the following values: 

Effects of Procedures: For each procedure p, Effect(p) := [II{p)] denotes the 
effect of the set of all same-level execution paths through p; 

Reachability: For a start value do G D, program point v and procedure p, 
Reach(u) := [iTr(u)]do and Reach(p) := [lJr{p)]do denote the least upper 
bounds on all abstract values reaching v along execution paths from main 
and the least upper bound on all abstract values reaching calls to p, respec- 
tively. 

The system of these values is called the Merge- Over- all- Paths solution (abbrevia- 
ted: MOP solution) of the analysis problem. Since the respective sets of execution 
paths are typically infinite, it is not clear whether this solution can be computed 
effectively. The standard approach proposed in data-flow analysis and abstract 
interpretation [3, 4, 2] consists in putting up a set C of constraints on the values 
we are interested in. The constraints are chosen in such a way that any solution 
to C is guaranteed to represent a safe approximation of the values. Quite fre- 
quently, however, the least solution of C equals the MOP solution [8, 13]. Then 
we speak of coincidence of the solutions, meaning that C precisely characterizes 
the MOP. 

In our present application, we are already given a constraint system whose least 
solution represents the sets of execution paths which are to be evaluated. By 
inspecting this constraint system, we would naturally try to obtain constraint 
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systems for effect analysis and reachability just by abstracting the lattice of sets 
of paths with our lattice F. Thus, the ordering “C” of set inclusion on sets of 
paths is mapped to the ordering on F; set union and concatenation is mapped 
to least upper bounds and composition of functions. Indeed, this abstraction 
mapping [.] has the following properties: 

Proposition 1. Let Mi, M2 C E* . Then the following holds: 

1. [M1UM2] = [Ml] U [M2]; 

2. [Ml ■ M2] = [M2] o [Ml] if both Mi and M2 are non-empty. □ 

Proposition 1 suggests a direct translation of the constraint system for the sets 
of execution paths into a constraint system which we are aiming at. The only 
two obstacles withstanding a direct translation are (1) an abstract interleaving 
operator (which for simplicity is denoted by “0” as well), and (2) a way how 
to deal with prefixes. For our abstract lattices, these two problems turn out to 
have surprisingly simple solutions. 

For fi = \x.{ai n x) U bi, i = 1 , 2 , we define the interleaving of fi and /2 by: 

/i ® /2 = Xx.{ai n tt2 n cc) U &i U 62 

We have: 

Proposition 2. Let fi,f2,f G F. Then the following holds: 

1- /l ® /2 = /l O /2 U /2 o fi; 

2. Cfi u / 2 ) 0 / = /i 0 / U /2 0 /; 

3. [Ml (g) M2] = [Ml] (g) [M2] for non-empty subsets Mi, M2 C E* . 

For a proof of Proposition 2 see appendix A. Let us now consider the set pre(M) 
of prefixes of a non-empty set MCE*. Then the following holds: 

Proposition 3. Let Em denote the edges oecurring in elements of M where for 
e € Em, [e] = Ax.(oe FI a;) U be. Then 

[pre{M)] = Xx.x U B where B = \_\{be [ e G Em} □ 

Thus, all the intersections with the Og have disappeared. What only remains is 
the least upper bound on the values be. 

5 Effect Analysis 

Now we have all prerequisites together to present a constraint system for ef- 
fect analysis. The least solution of the constraint system defines values [p] for 
the effect of procedures p together with values [u] for the effects of same-level 
execution paths reaching program point v. 



b] 


□ 


[r] 




r return point of p 


(1) 


[5] 


□ 


I 




s entry point 


(2) 


b] 


□ 


fe 0 


[u] 


e = {u, v) basic edge 


( 3 ) 


b] 


□ 


b] 0 


[u] 


e = {u, v) calls p 


( 4 ) 


b] 


□ 


(bi] 


(g) [P2]) 0 [u] 


e = {u, v) calls pi[[p2 


( 5 ) 
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Lines (1) through (4) are the lines to determine the effects of procedures as 
known from inter-procedural analysis of sequential programs. Line (1) says that 
the effect of procedure p is the effect of what has been accumulated for the 
return point of p. Line (2) says that accumulation of effects starts at entry 
points of procedures with the identity function I = \x.x. Line (3) says that the 
contribution of a basic edge e = (u, v) to the value for v is given by the value 
for u extended by the application of the function fe associated with this edge. 
Line (4) says that the contribution of an edge e = (u, v) calling a procedure p is 
determined analogously with the only difference that the function /g in line (3) is 
now replaced with the effect [p] of the called procedure. Also line (5) for a parallel 
call has the same form. But now, in order to determine the combined effect of 
the parallely executed procedures p\ and p 2 , we rely on the interleaving operator 
“(g)”. This constraint system for effect analysis is the direct abstraction of the 
corresponding constraint system for same-level reaching paths from section 3. 
Therefore, we obtain (by distributivity of all involved operators): 

Theorem 1. The least solution of the effect constraint system precisely descri- 
bes the effect of procedures, i.e., 

Effect (p) = [p] and Effect (r)) = [w] 

for every procedure p and program point v. These values can he computed in time 
0{h ■ n) where n is the size of the program. □ 

6 A Constraint System for Reachability 

As for effect analysis, we could mimic the least fixpoint definition of the sets 
of reaching execution paths through a corresponding constraint system over F. 
Observe, however, that our defining constraint system for reaching execution 
paths in section 3 has quadratic size. Clearly, we would like to improve on this, 
and indeed this is possible - even without sacrificing precision. 

Instead of accumulating effects in a topdown fashion as was necessary in the pre- 
cise definition of reaching execution paths, we prefer a bottom-up accumulation 
- a strategy which is commonly used in inter-procedural analysis of sequential 
programs. There, accumulation directly starts at the main program and then 
successively proceeds to called procedures. 

For each program point v, let B(v) denote the least upper bound of all be, for 
all basic edges e possibly executed in parallel with v. This value is also called 
possible interference of v. Formally, these values are determined through the 
least solution of the following constraint system: 



a{p) 


3 be 


e basic edge in procedure p 


a{p) 


3 cr{q) 


procedure p calls g or q 1 1 _ or _ 1 1 g 


B{v) 


3B(p) 


V program point in p 


B{p) 


3 B{u) 


(m, _) calls procedure p 



B{qi) □ <j{qz-i) U B{u) {u, _) calls qi\\q 2 
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We used auxiliary values cr(p), p a procedure, to calculate the least upper bound 
on be for all basic edges possibly executed during evaluation of p. The whole 
system for computing the values <j{p), B(p) and B{v) is of linear size and uses 
“U” as only operation in right-hand sides. Such kind of problems are also known 
as “pure merge problems” and can be solved even in linear time. 

We will now construct a constraint system as for inter-procedural reachability 
analysis of sequential programs, but for each program point additionally take its 
possible interference into account. Thus, we consider the values |u], u a program 
point, |p], p a procedure, which are determined as the least solution of the 
following constraint system: 



|main] □ do (1) 

|ul □ B{v) and (2) 

|u] □ [v] |p] V program point in procedure p (3) 

[p] 3 I^l e = {u, _) calls p or p 1 1 _ or _ 1 1 p (4) 



Only line (2) makes the difference to a corresponding constraint system for 
reachability in sequential programs. The intuition behind the constraint system 
is as follows. Line (1) says that initially the value reaching main should subsume 
the initial value do. Line (2) says that the value reaching program point v should 
subsume its possible interference. Line (3) says that when u is a program point 
of procedure p, then the reaching value should also subsume the intra-procedural 
effect of V applied to the value reaching p. Line (4) finally says that the value 
reaching a procedure should subsume the value of every program point where 
such a call (possibly in parallel to another call) is possible. 

This constraint system differs considerably from the constraint system for the 
sets of reaching execution paths. Nonetheless, we are able to prove: 

Theorem 2. The above constraint system computes precise reachability infor- 
mation as well, i.e., 

Reach (p) = |p] and Reach (u) = |w] 

for all program points v and procedures p. These values can be computed in time 
0{h ■ n) where n is the size of the program. 

For a proof see appendix B. Theorem 2 implies that programs with procedures 
and parallelism are not harder to analyze than programs with procedures but 
without parallelism! 

7 Extensions 

In this section, we discuss issues which are important for the practical applica- 
bility of the presented results. We do not claim that this section contains any 
new ideas or constructions. Rather we want to emphasize that the construc- 
tions known from the inter-procedural analysis of sequential programs can be 
extended to parallel programs in a straight-forward way. 
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7.1 Non-reachable Program Points 

So far, we assumed that every program point is reachable by at least one execu- 
tion path. In order to show that this assumption is not vital, let V and TZ denote 
the sets of possibly terminating procedures and reachable program points, re- 
spectively. In order to compute these sets, we instantiate our generic analysis 
with D = {0 C 1} where for each basic edge e, the function [e] = fe is given 
by /e = ^ = Xx.x, and the initial value do equals 1. The only functions from 
D — >■ D occurring during the analysis are Ax._L and I. Both functions are strict, 
i.e., map _L to _L. Therefore, we obtain: 

Proposition 4. For every procedure p and program point v, the following holds: 

1. [w] = I Zj(f 7T(w) yf 0 and [p] = I iff II {p) ^ 0; 

2- H = 1 iff IIr{v) yf 0 and |p] = 1 iff IIr{p) yf 0. 

In particular, p &V iff [p] = I, and v &IZ iff \v\ = 1. □ 

We conclude that the sets V and IZ can be computed in linear time. 

A non-reachable program point should not influence any other program point. 
Therefore, we modify the given cfg by removing all edges starting in program 
points not in IZ. By this edge removal, the sets of reaching execution paths have 
not changed. Let us call the resulting cfg normalized. Then we obtain: 

Theorem 3. Assume the cfg is normalized. Then for every program point v and 
procedure p, 

1. Effect(u) = [u] and Effect(p) = [p]; 

2. Reach(u) = |w] and Reach(p) = |p]. □ 

We conclude that, after the preprocessing step of normalization, our constraint 
systems will compute a safe approximation which is precise. 

Practical Remark: Normalization of the cfg may remove edges and thus some 
constraints from the constraint systems of the analysis. Therefore, omitting nor- 
malization may result in a less precise, but still safe analysis. 

7.2 Backward Analysis 

What we discussed so far, is called forward analysis. Examples of forward ana- 
lysis problems are reaching definitions, available expressions or simple constant 
propagation. Other important analyses, however, determine the value at a pro- 
gram point V w.r.t. the possible future of v, i.e., the set of reverses of execution 
paths possibly following a visit of v. Examples are live variables or very busy 
expressions. Such analyses are called backward analyses. In case that every for- 
ward reachable program point is also backward reachable, i.e., lies on an exe- 
cution path from the start point to the return point of main, we can reduce 
backward analysis to forward analysis - simply by normalizing the cfg followed 
by a reversal of edge orientations and an exchange of entry and return points of 
procedures. 
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7.3 Local and Global State 

Consider an edge e = (u, v) in the cfg which calls a terminating procedure p 
(the treatment of a terminating parallel call to two procedures pi and p 2 is 
completely analogous). So far, the complete information at program point u is 
passed to the entry point of p. Indeed, this is adequate when analyzing global 
properties like availability of expressions which depend on global variables only. 
It is not (immediately) applicable in presence of local variables which are visible 
to the caller but should be hidden from the callee p, meaning that they should 
survive the call unchanged [8, 13]. 

To make things precise, let us assume that D = D/ x Dg where D; and Dg describe 
local and global properties, respectively. Let us further assume that the global 
part of the current state is passed as a parameter to p, and also returned as the 
result of the call, whereas the local part of the program point before the call is 
by-passed the call using some transformer /3e : D; — >■ D; . Recall that every / € F 
is of the form / = Xx.{x □ a) U & with a, 5 G D. Since D is a Cartesian product, 
this implies that f = fi>^ fg where /; : D; — >• D/ and /g : Dg — >■ Dg independently 
operate on the local states and global states, respectively. 

Therefore, we can separate the analysis into two phases. 

The first phase considers just global values from Dg. No local state need to be 
preserved during the call, and we use the original call edge. 

The second phase then is purely intra-procedural and deals with the lattice D;. 
But now, since the call at edge e has no effect onto the local state, we simply 
change e into a basic edge with [e] = (3g. 

8 Conclusion and Perspectives 

We have shown how to extend the intra-procedural method of [11] to uniformly 
and efficiently capture inter-procedural bitvector analyses of fork/join parallel 
programs. Our method, which comprises analysis problems like available expres- 
sions, live variables or simple constant propagation, passes the test for prac- 
ticality, as it ‘behaves’ as the widely accepted algorithms for sequential inter- 
procedural program analysis. Moreover, even though precision can only be pro- 
ved for fork/join parallelism, our algorithm may also be used for computing safe 
approximations for languages with arbitrary synchronization statements. Finally, 
due to its structural similarity to the sequential case, it can easily be integrated 
in program analysis environments like e.g. MetaFrame or PAG, which already 
contain the necessary fixpoint machinery. 

As a next step, we plan a closer comparison with the automata theoretic ap- 
proach of Esparza and Podelski. The considered program structures are obviously 
similar, however, the range of possible analyses may be different. As shown in 
[12], the automata theoretic approach is able to capture the model checking pro- 
blem for all of the linear time temporal logic EF. It would be interesting to see 
whether it is possible to adopt our technique to covering this logic as well, or 
whether the automata theoretic approach, which is significantly more complex 
already for the analysis problem considered here, is inherently more powerful. 
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A Proof of Proposition 2 

We only prove statement (3). Let Mi, M 2 be non-empty subsets of E* . By sta- 
tement (1), we have 

[Ml] O [M2] = [Ml] o [M2] U [M2] o [Ml] 

= [M 2 * Ml U Ml * AL 2 ] U [ilTi (S) M 2 ] 

Therefore, it remains to prove the reverse inequality. For that consider w = 

ei . . . Cm G Ml® M 2 where for disjoint index sets Ii,l 2 with /i U /2 = m}, 

Wi = w\I^ £ Mi. We claim: 

[w] U [wi] o [W 2 ] U [W 2 ] o [wi] 
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Clearly, this claim implies the statement (3) of our proposition. In order to prove 
the claim, let [ci] = Xx.{air\x)Ubi, i = 1, . . . , m, [tCj] = \x.{Ajnx)UBj, j = 1, 2, 
and [tc] = Xx.{A r\x)\-\ B. Then by definition, 

j 4 = fti n . . . n UjYi = A\ n A2 

Now consider value B. By definition, 

m 

^ ^ |_| n a^+i n . . . n am) 

k^l 

We will show that for every k, 

bk n Ofc+i n . . . n Om E Si u S2 

W.l.o.g. assume that k £ Ii (the case where k £ I 2 is completely analogous) and 
let {ji, = {j £li\j> k}. Then 

bk n Ofc+1 n . . . n E n n . . . n e Si 

which implies the assertion. □ 

B Proof of Theorem 2 

Let us start with the following simple but useful observation: 

Proposition 5. For every / G F, 6 G ID and A = Xx.x U b, 

f ® A = Ao f □ 

Next, we reformulate the constraint system for reachability as follows. We in- 
troduce the new values |u]', v a program point, and |p]', p a procedure, which 
collect the least upper bounds of directly reaching values by ignoring possible in- 
terleavings with execution paths possibly executed in parallel to v (or p) . These 
values are determined as the least solution of the following constraint system: 



|main]' □ do (1) 

l-u]]' □ [ri] o |p]' V program point in procedure p (2) 

IpF 3 |uF e = (u, _) calls p or pII _ or _||p (3) 

By standard fixpoint induction we find: 

Proposition 6. For all program points v and procedures p, 

= M' U B{v) and |p] = |p]' U B{p) □ 



In order to understand the “nature” of the values B{v), we consider the sets 
P{v) of edges possibly executed in parallel with program points v. They are 
determined through the least solution of the following constraint system: 

E{p) A {e} e basic edge in procedure p 

E{p) A E{q) procedure p calls g or g 1 1 _ or _ 1 1 q 

P{v) A P{p) V program point in p 

P{p) E P{u) {u, _) calls procedure p 

P{qi) E E{q 3 ^i) U P{u) (m, _) calls qi\\q 2 
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By comparison of this constraint system with the definition of the values cr(p) 
and B{v),B{p) in section 6, we obtain: 

Proposition 7. For every procedure p and program point v, 

1. Xx.x U a{p) = I U [E{p)]; 

2. Xx.x U b(p) = / U [P{p)] and Xx.x Li B{v) = I Li [P(n)] . □ 

Moreover, we have: 

Proposition 8. For every procedure p, 

[pre{n{p))] = iLi L|['E'(p)] = Xx.x U a{p) □ 

In order to simplify the proof of theorem 2, let us assume that all calls are parallel 
calls qi 1 1 <72 • This assumption does not incur a restriction, since an ordinary call 
to a procedure p can easily be simulated by a call to p 1 1 go where go is a procedure 
with just a single program point and no edges at all. Furthermore, it suffices to 
prove the assertion of the theorem just for program points v (the assertion for 
procedures then is an immediate consequence) . We want to prove that for every 
program point v, the value |u] is a safe approximation of the value Reach (u), 
i.e., |u] □ Reach (u). By definition, 

Reach(w) = [Flr{v)] do = [n{v, main)] do 

Therefore, let w G main). Then there are program points uo, . . . ,Um, exe- 
cution paths Wo, ... , Wm together with execution paths w', procedures g^*\ g 2 *^ 
and indices j(i) € {1,2} for i = 1, . . . , m such that: 

- Ura = V, 

— Wi € n (ui) for i = 0, . . . , to; 

— there are calls (wj-i, _) to || g^*^; 

— Mo is a program point in main and for t > 0, is a program point in gj^^p 

- w' G pre(77(g^*2^.(.))) for t = 1, . . . , to; 

- w G {woj • ({w'll 0 ({wij • (. . . (g) ({w™_i} • {{w'^} ® {w™})) . . .))). 

Let A = Xx.x U P{v). Then by proposition 8, 

K] C [pre(iT(g« .(p))j = I U [S(g«,-(q)j E / □ [P(u)j = Z\ 
for all z = 1, . . . , TO. Therefore by proposition 5, 

M E (((• • • {{[Wm] (g) [w'J) o [w„_i]) (g) . . .) o [wi]) (g) KD O K] 

c (((. . . (([w„] (g) Z\) o [Wm_i]) (g) Z\ . . .) o [wi]) (g) Z\) O [wo] 

= L\ o (. . . (([w„] (g) Z\) O [Wm_i]) (g) Z\ . . .) o [wi] O [wo] 

= Z\ O [Wm] O [Wm-l] O . . . O [wo] 

Since ([wm] o . . . o [wq]) do E we conclude that 

[w] do E |v]' = Ivf U B{v) = |m] 
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which we wanted to prove. 

It remains to prove the reverse inequality, i.e., that (1) |w]' C Reach(v) and (2) 
B{v) C Reach (w). 

Let us first consider inequality (1). The value |u]' is the least upper bound on 
values [tc] c?o such that there exit program points uq, . . . ,Um, execution paths 
Wo, . . . ,Wm together with procedures qi\q 2 ~^ and indices j{i) G {1,2} for i = 
1, . . . , m such that: 



— Um = v; 

— Wi € n (ui) for i = 0, . . . ,m; 

— there are calls (wi_i, _) to || q^'^; 

(i) 

— Mo is a program point in main and for f > 0, m^ is a program point in 

— W = Wo - ■ ■ Wm- 



By induction on r = m — f (from r = 0 to r = m — 1), we find that for i > 0, 



and for z = 0, 



w = Wo ■ ■ ■ Wm G n{v, main) = IIr{v) 



Therefore, 



[zc] do E [nr{v)] do = Reach ( m) 



which we wanted to prove. 

Now let us consider inequality (2). By proposition 7, Xx.x U B{v) = I U [P(m)]. 
Therefore, it suffices to prove for each edge e G P{v), that be E Reach(M). 

Since e G P{v), there exist program points mq, . . . , Um, execution paths wo, ■ ■ ■ , Wm 
together with procedures qi\q^\ indices j{i) G (1, 2} for z = 1, . . . , m, an index 
/c G {1, . . . , m} and one execution path w' such that 



— Um = v; 

— Wi € n (ui) for z = 0, . . . , to; 

— there are calls (mz-i, _) to gj*^ || g^*^; 

— Mo is a program point in main and for z > 0, mz is a program point in gj(z)’ 

— w'e G pre(7T(g^*^(^)). 

As above, we conclude that Wk ■ ■ ■ Wm G n{v, By definition, then also 

Wk-iWk ■ ..Wmw'e G 

(where in case fc = 1, we let g^.°g^ = main) and therefore also 

Wo ■ ■ ■ Wk-iWk ■ . . Wmw'e G n{v, main) = IIr{v) 



We conclude that 

5e E LI (Oe L {[wo ■ ■ ■ Wmw'] do)) = [wq . . . Wmw'e] do E [Briv)] do = Reach(z;) 
which completes the proof. □ 
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Abstract. Linear type systems allow destructive operations such as ob- 
ject deallocation and imperative updates of functional data structures. 
These operations and others, such as the ability to reuse memory at 
different types, are essential in low-level typed languages. However, tra- 
ditional linear type systems are too restrictive for use in low-level code 
where it is necessary to exploit pointer aliasing. We present a new typed 
language that allows functions to specify the shape of the store that they 
expect and to track the flow of pointers through a computation. Our type 
system is expressive enough to represent pointer aliasing and yet safely 
permit destructive operations. 



1 Introduction 

Linear type systems [26, 25] give programmers explicit control over memory 
resources. The critical invariant of a linear type system is that every linear value 
is used exactly once. After its single use, a linear value is dead and the system 
can immediately reclaim its space or reuse it to store another value. Although 
this single-use invariant enables compile-time garbage collection and imperative 
updates to functional data structures, it also limits the use of linear values. For 
example, x is used twice in the following expression: let x = (1,2) in let y = 
fst{x) in let z = snd{x) in y + z. Therefore, x cannot be given a linear type, 
and consequently, cannot be deallocated early. 

Several authors [26, 9, 3] have extended pure linear type systems to allow 
greater flexibility. However, most of these efforts have focused on high-level user 
programming languages and as a result, they have emphasized simple typing 
rules that programmers can understand and/or typing rules that admit effective 
type inference techniques. These issues are less important for low-level typed 
languages designed as compiler intermediate languages [22, 18] or as secure mo- 
bile code platforms, such as the Java Virtual Machine [10], Proof-Carrying Code 
(PCC) [13] or Typed Assembly Language (TAL) [12]. These languages are desi- 
gned for machine, not human, consumption. On the other hand, because systems 
such as PCC and TAL make every machine operation explicit and verify that 
each is safe, the implementation of these systems requires new type-theoretic 
mechanisms to make efficient use of computer resources. 

* This material is based on work supported in part by the AFOSR grant F49620-97- 
1-0013 and the National Science Foundation under Grant No. EIA 97-03470. Any 
opinions, findings, and conclusions or recommendations expressed in this publication 
are those of the authors and do not reflect the views of these agencies. 

G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 366-381, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 
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In existing high-level typed languages, every location is stamped with a single 
type for the lifetime of the program. Failing to maintain this invariant has re- 
sulted in unsound type systems or misfeatures (witness the interaction between 
parametric polymorphism and references in ML [23, 27]). In low-level langu- 
ages that aim to expose the resources of the underlying machine, this inva- 
riant is untenable. For instance, because machines contain a limited number 

of registers, each register cannot be stamped with a single type. Also, when 

two stack-allocated objects have disjoint lifetimes, compilers naturally reuse the 
stack space, even when the two objects have different types. Finally, in a low- 
level language exposing initialization, even the simplest objects change type. For 
example, a pair x of type (mt, int) may be created as follows: 

malloc x,2 ; x has type (junk, junk) *) 
x[l]:=l ; (* X has type (int, junk) *) 

x[2]:=2 ; (* x has type (int, int) *) 



At each step in this computation, the storage bound to x takes on a different 
type ranging from nonsense (indicated by the type junk) to a fully initialized pair 
of integers. In this simple example, there are no aliases of the pair and therefore 
we might be able to use linear types to verify that the code is safe. However, 
in a more complex example, a compiler might generate code to compute the 
initial values of the tuple fields between allocation and the initializing assign- 
ments. During the computation, a register allocator may be forced to move the 
uninitialized or partially initialized value x between stack slots and registers, 
creating aliases: 




If a; is a linear value, one of the pointers shown above would have to be 
“invalidated” in some way after each move. Unfortunately, assuming the pointer 
on the stack is invalidated, future register pressure may force x to be physically 
copied back onto the stack. Although this additional copy is unnecessary because 
the register allocator can easily remember that a pointer to the data structure 
remains on the stack, the limitations of a pure linear type system require it. 

Pointer aliasing and data sharing also occur naturally in other data structures 
introduced by a compiler. For example, compilers often use a top-of-stack pointer 
and a frame pointer, both of which point to the same data structure. Compiling 
a language like Pascal using displays [1] generalizes this problem to having an 
arbitrary (but statically known) number of pointers into the same data structure. 
In each of these examples, a flexible type system will allow aliasing but ensure 
that no inconsistencies arise. Type systems for low-level languages, therefore, 
should support values whose types change even when those values are aliased. 
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We have devised a new type system that uses linear reasoning to allow me- 
mory reuse at different types, object initialization, safe deallocation, and tracking 
of sharing in data structures. This paper formalizes the type system and provi- 
des a theoretical foundation for safely integrating operations that depend upon 
pointer aliasing with type systems that include polymorphism and higher-order 
functions. 

We have extended the TAL implementation with the features described in 
this paper. ^ It was quite straightforward to augment the existing F‘^-based type 
system because many of the basic mechanisms, including polymorphism and 
singleton types, were already present in the type constructor language. Popcorn, 
an optimizing compiler for a safe C-like language, generates code for the new 
TAL type system and uses the alias tracking features of our type system. 

The Popcorn compiler and TAL implementation demonstrate that the ideas 
presented in this paper can be integrated with a practical and complete pro- 
gramming language. However, for the sake of clarity, we only present a small 
fragment of our type system and, rather than formalizing it in the context of 
TAL, we present our ideas in terms of a more familiar lambda calculus. Section 2 
gives an informal overview of how to use aliasing constraints, a notion which ex- 
tends conventional linear type systems, to admit destructive operations such 
as object deallocation in the presence of aliasing. Section 3 describes the core 
language formally, with emphasis on the rules for manipulating linear aliasing 
constraints. Section 4 extends the language with non-linear aliasing constraints. 
Finally, Section 5 discusses future and related work. 



2 Informal Overview 

The main feature of our new type system is a collection of aliasing constraints. 
Aliasing constraints describe the shape of the store and every function uses them 
to specify the store that it expects. If the current store does not conform to the 
constraints specified, then the type system ensures that the function cannot 
be called. To illustrate how our constraints abstract a concrete store, we will 
consider the following example: 




Here, sp is a pointer to a stack frame, which has been allocated on the heap (as 
might be done in the SML/NJ compiler [2], for instance). This frame contains a 
pointer to a second object, which is also pointed to by register r\. 

In our program model, every heap-allocated object occupies a particular me- 
mory location. For example, the stack frame might occupy location and the 



^ See http://www.cs.cornell.edu/talc for the latest software release. 
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second object might occupy location to. In order to track the flow of pointers to 
these locations accurately, we reflect locations into the type system: A pointer 
to a location I is given the singleton type ptr{£). Each singleton type contains 
exactly one value (the pointer in question). This property allows the type sy- 
stem to reason about pointers in a very fine-grained way. In fact, it allows us to 
represent the graph structure of our example store precisely: 

SP STACK 



INT 

BOOL 

PTR(lo) 

We represent this picture in our formal syntax by declaring the program variable 
sp to have type ptr{ts) and ri to have type ptr{to)- The store itself is described 
by the constraints {tg >->■ {int,bool,ptr{£o))} © {to i-T {int)}, where the type 
(ti, . . . , Tn) denotes a memory block containing values with types t\ through t^. 

Constraints of the form {t i— >■ r} are a reasonable starting point for an 
abstraction of the store. However, they are actually too precise to be useful 
for general-purpose programs. Consider, for example, the simple function deref, 
which retrieves an integer from a reference cell. There are two immediate pro- 
blems if we demand that code call deref when the store has a shape described 
by {£ !->■ (int)}. First, deref can only be used to derefence the location £, and 
not, for example, the locations £' or £" . This problem is easily solved by adding 
location polymorphism. The exact name of a location is usually unimportant; we 
need only establish a dependence between pointer type and constraint. Hence 
we could specify that dere/ requires a store {p i— >■ (int)} where p is a location 
variable instead of some speciflc location £. Second, the constraint {£ >->■ (int)} 
specifies a store with exactly one location £ although we may want to dereference 
a single integer reference amongst a sea of other heap-allocated objects. Since 
deref does not use or modify any of these other references, we should be able 
to abstract away the size and shape of the rest of the store. We accomplish this 
task using store polymorphism. An appropriate constraint for the function deref 
is e © {p I— {int)} where e is a constraint variable that may instantiated with 
any other constraint. 

The third main feature of our constraint language is the capability to distin- 
guish between linear constraints {p i— >■ t} and non-linear constraints {p i— >■ t}“. 
Linear constraints come with the additional guarantee that the location on the 
left-hand side of the constraint (p) is not aliased by any other location (p'). This 
invariant is maintained despite the presence of location polymorphism and store 
polymorphism. Intuitively, because p is unaliased, we can safely deallocate its 
memory or change the types of the values stored there. The key property that ma- 
kes our system more expressive than traditional linear systems is that although 
the aliasing constraints may be linear, the pointer values that flow through a 
computation are not. Hence, there is no direct restriction on the copying and 
reuse of pointers. 
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The following example illustrates how the type system uses aliasing con- 
straints and singleton types to track the evolution of the store across a series of 
instructions that allocate, initialize, and then deallocate storage. In this exam- 
ple, the instruction malloc x, p, n allocates n words of storage. The new storage 
is allocated at a fresh location I in the heap and i is substituted for p in the 
remaining instructions. A pointer to i is substitued for x. Both p and x are 
considered bound by this instruction. The free instruction deallocates storage. 
Deallocated storage has type junk and the type system prevents any future use 
of that space. 

Instructions Constraints (Initially the constraints e) 

1. malloc sp,pi,2; e© {pi i— >■ (junk, junk)} sp : ptr(pi) 

2. sp[l]:=l; e(B {pi (int,junk)} 

3. malloc r\,p 2 , 1; e© {pi >->■ (int, junk) , p 2 i-d- (junk)} ri : ptr(p 2 ) 

4. sp[2]:=ri; e © {pi (int,ptr{p 2 )) , P 2 (junk)} 

5. ri[l]:=2; e® {pi^ (int,ptr(p 2 )),P 2 ^ (int)} 

6. free rp, e © {pi i— >■ (int, ptr(p 2 )) , P 2 '— >■ junk} 

7. free sp; e © {pi i— >■ junk, p 2 >— >■ junk} 

Again, we can intuitively think of sp as the stack pointer and r\ as a register 
that holds an alias of an object on the stack. Notice that on line 5, the initia- 
lization of Cl updates the type of the memory at location p 2 - This has the effect 
of simultaneously updating the type of r\ and of sp[l]. Both of these paths are 
similarly affected when r\ is freed in the next instruction. Despite the presence 
of the dangling pointer at sp[l], the type system will not allow that pointer to 
be derefenced. 

By using singleton types to accurately track pointers, and aliasing constraints 
to model the shape of the store, our type system can represent sharing and 
simultaneously ensure safety in the presence of destructive operations. 

3 The Language of Locations 

This section describes our new type-safe “language of locations” formally. The 
syntax for the language appears in Figure 1. 

3.1 Values, Instructions, and Programs 

A program is a pair of a store (S) and a list of instructions (l). The store maps 
locations (£) to values (v). Normally, the values held in the store are memory 
blocks ((ti, . . . ,T„)), but after the memory at a location has been deallocated, 
that location will point to the unusable value junk. Other values include integer 
constants (i), variables (x or /), and, of course, pointers (ptr(fi)). 

Figure 2 formally defines the operational semantics of the language.^ The 
main instructions of interest manipulate memory blocks. The instruction 

^ Here and elsewhere, the notation X[ci, . . . , Cnjxi, . . . , Xn\ denotes capture-avoiding 
substitution of ci, . . . , c„ for variables xi, . . . ,Xn in A. 
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£ G Locations p G Location Var e G Constraint Var a;, / G ValueVar 



locations 


g 


■=£ \ p 




constraints 


C 


:= 0 £ {r; !->• r} Cl © C 2 




types 


T 


:= int 1 junk \ ptr{g) \ (n, . . . 


Tn) V[A; C].(ri, . . . , r„)->-0 


value ctxts 


r 


:= • 1 P,x:t 




type ctxts 


A 


\ A, p\ A, e 




values 


V : 


:= X \ i \ junk | ptr(£) | (vi, . 


. , w„) 1 f ix /[A; C; F].4 | v[g] \ v[C] 


instructions 


4 : 


:= malloc x, p,n; b \ x=v[i]-, t 


v[i]:=v'-, 4 1 free w; 4 | 






v{vi , . . . , u„) halt 




stores 


S 


:= {£1 Vi, . . . ,£n Vn} 




programs 


P : 


:= (5,4) 





Fig. 1. Language of Locations: Syntax 



malloc x,p,n allocates an unitialized memory block (filled with junk) of size n 
at a new location and binds x to the pointer ptr(£). The location variable p, 
bound by this instruction, is the static representation of the dynamic location 
The instruction x=v[i] binds x to the ith component of the memory block 
pointed to by v in the remaining instructions. The instruction v[i\\=v' stores v' 
in the ith component of the block pointed to by v. The final memory mana- 
gement primitive, free v, deallocates the storage pointed to by v. If v is the 
pointer ptr(£) then deallocation is modeled by updating the store (S) so that 
the location £ maps to junk. 

The program ({}, malloc x, p, 2; a;[l]:=3; x[2]:=5; free x; halt ) allocates, in- 
itializes and finally deallocates a pair of integers. Its evaluation is shown below: 



Store 


Instructions 








{} 


malloc X, p, n 


(* 


allocate new location £, 








(* 


substitute ptr(£),£ for x,p 




{£h© (junk, junk)} 


ptr(£)[l]:=3 


(* 


initialize field 1 




{£ (3, junk)} 


ptr(£) [2]:=5 


(* 


initialize field 2 




{£^(3,5)} 


free ptr(£) 


(* 


free storage 





{£ 1 -^ junk} 

A sequence of instructions (6) ends in either a halt instruction, which stops 
computation immediately, or a function application (u(vi, . . . , u„)). In order to 
simplify the language and its typing constructs, our functions never return. Ho- 
wever, a higher-level language that contains call and return statements can be 
compiled into our language of locations by performing a continuation-passing 
style (CPS) transformation [14, 15]. It is possible to define a direct-style lan- 
guage, but doing so would force us to adopt an awkward syntax that allows 
functions to return portions of the store. In a CPS style, all control-flow trans- 
fers are handled symmetrically by calling a continuation. 

Functions are defined using the form fix f [A; C; r].L. These functions are 
recursive (/ may appear in t). The context (A;C;r) specifies a pre-condition 
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that must be satisfied before the function can be invoked. The type context A 
binds the set of type variables that can occur free in the term; C is a collection 
of aliasing constraints that statically approximates a portion of the store; and T 
assigns types to free variables in i. 

To call a polymorphic function, code must first instantiate the type variables 
in A using the value form: v [rj\ or v [C] . These forms are treated as values because 
type application has no computational effect (types and constraints are only used 
for compile-time checking; they can be erased before executing a program). 



(5',malloc *,p, n;t) i — S- i-> (junk^, . . . , junk„)}, t[£/p][ptr(£)/®]) 

where i ^ S 

(S{£ 1 -^ w}, free ptr(t); t) i — >■ i— >■ junk}, t) 

if w = (wi , . ..,v„) 

1 -^ w},ptr(£)[i]:=w'; t) ' — {vi, . . . ,Vi-i,v' ,Vi+i, . . . ,Vn}}, 
if w = (fi, . . . , Vn) and 1 < i < n 
(S{£ 1 -^ w}, x=ptr{£)[i]- l) I — >• (S{£ 1 -^ w}, L[vi/x\) 
if w = (vi, . . . , Vn) and 1 < i < n 

(5, v{vi, . . . , Vn)) I >■ (5, i[ci, . . . ■ ■ ■ ,/?m]K, Wl, . . . ,Vnj j,X\, . . . , Xn]) 

if V = V'[CI, . . . ,Cm] 

and v' = fix/[Z\;C';a;i:Ti, . . . ,Xn-Tn]-i 

and Dom{A) = Pi, , dm (where d ranges over p and e) 



Fig. 2. Language of Locations: Operational Semantics 



3.2 Type Constructors 

There are three kinds of type constructors: locations^ ( 77 ), types (r), and aliasing 
constraints (C). The simplest types are the base types, which we have chosen 
to be integers (int). A pointer to a location rj is given the singleton type ptr{rj). 
The only value in the type ptr{r]) is the pointer ptr(? 7 ), so if vi and V2 both have 
type ptr{r]), then they must be aliases. Memory blocks have types ((n, . . . , r„)) 
that describe their contents. 

A collection of constraints, C, establishes the connection between pointers of 
type ptr{r]) and the contents of the memory blocks they point to. The main form 
of constraint, written {77 1— f rj, models a store with a single location 77 containing 
a value of type t. Collections of constraints are constructed from more primitive 
constraints using the join operator (©). The empty constraint is denoted by 0 . 
We often abbreviate {77 i-f t} © {77' i-f r'} with {77 i-f r, 77' i-f t'|. 

® We use the meta-variable £ to denote concrete locations, p to denote location varia- 
bles, and 77 to denote either. 
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3.3 Static Semantics 

Store Typing The central invariant maintained by the type system is that the 
current constraints C are a faithful description of the current store S. We write 
this store-typing invariant as the judgement \~ S \ C. Intuitively, whenever a 
location (. contains a value v of type t, the constraints should specify that location 
^ maps to r (or an equivalent type t'). Formally: 

‘ y V\ T\ * • • \~y Vn : Tn 

h {^1 I— >■ , . . . , t'n I— Vn} : {£i I— >■ Ti, . . . , I— T„} 

where for 1 < t < n, the locations £i are all distinct. And, 

h S:C' ■ h C' = C 

h S' : C 

Instruction Typing Instructions are type checked in a context A; C; T. The jud- 
gement A; C;T \-^ L states that the instruction sequence is well-formed. A related 
judgement. A; T \-y v : t, ensures that the value v is well-formed and has type 

T. ^ 

Our presentation of the typing rules for instructions focuses on how each rule 
maintains the store- typing invariant. With this invariant in mind, consider the 
rule for projection: 



A; r \-y V : ptr{r]) 

Z\hC' = C"©{?7H> (ri,...,r„)} A; C; T,x:Ti \~, l / \ 

A; C';TK x=v[i]-, i. U < i < nj 

The first pre-condition ensures that u is a pointer. The second uses C to deter- 
mine the contents of the location pointed to by w. More precisely, it requires that 
C equal a store description C (B{rj ^ (ti, . . . , r„)}. (Constraint equality uses A 
to denote the free type variables that may appear on the right-hand side.) The 
store is unchanged by the operation so the final pre-condition requires that the 
rest of the instructions be well-formed under the same constraints C . 

Next, examine the rule for the assignment operation: 



A',r\-yv: ptr{rj) A; T \-y v' : t' 

A h C = C" © {?7 (n, . . . , T„)} A; C" © {?7 T-afteri; ^ '' 

A; C; T v[i]:=v'] l 



(1 < i < n) 



where is (n, . . . , Tj_i, r', n+i, ...,Tn) 

Once again, the value v must be a pointer to some location rj. The type of the 
contents of ij are given in C and must be a block with type (n, . . . ,r„). This 
time the store has changed, and the remaining instructions are checked under 
the appropriately modified constraint C" © {77 1— >■ 

The subscripts on and hi are used to distinguish judgement forms and for no 
other purpose. 




374 



F. Smith, D. Walker, and G. Morrisett 



How can the type system ensure that the new constraints C" © {77 !->■ 
correctly describe the store? If v' has type t' and the contents of the location 
■q originally has type (ti, . . . ,r„), then {q 1— >■ describes the contents of 

the location q after the update accurately. However, we must avoid a situation 
in which C continues to hold an outdated type for the contents of the location 
q. This task may appear trivial: Search C for all occurrences of a constraint 
{q I— >■ r} and update all of the mappings appropriately. Unfortunately, in the 
presence of location polymorphism, this approach will fail. Suppose a value is 
stored in location pi and the current constraints are {pi 1— >■ r, p2 t}. We 
cannot determine whether or not p\ and p2 are aliases and therefore whether 
the final constraint set should be {p\ !->■ r',p2 i-d- r'} or {pi >->• r',p2 i-d- r}. 

Our solution uses a technique from the literature on linear type systems. 
Linear type systems prevent duplication of assumptions by disallowing uses of 
the contraction rule. We use an analogous restriction in the definition of con- 
straint equality: The join operator © is associative, and commutative, but not 
idempotent. By ensuring that linear constraints cannot be duplicated, we can 
prove that pi and p2 from the example above cannot be aliases. The other equa- 
lity rules are unsurprising. The empty constraint collection is the identity for © 
and equality on types r is syntactic up to a-conversion of bound variables and 
modulo equality on constraints. Therefore: 

A h {pi !->• {int)} © {p2 >->■ {hool)} = {p2 >->■ {hool)} © {p\ >->■ {int)} 

but, 

A 1 / {pi !->• {int)}®{p2 >->■ {hool)} = {p\ >->■ (znt)}©{pi >->■ (int)}©{p2 {bool)} 

Given these equality rules, we can prove that after an update of the store 
with a value with a new type, the store typing invariant is preserved: 

Lemma 1 (Store Update). If h S{i ^ v} : C (B {t ^ t} and u' : r' 

then h S{1 v'} : C (B {I t'} . 

where S{£ 1— u} denotes the store S extended with the mapping i v (provided 
£ does not already appear on the left-hand side of any elements in S). 

Function Typing The rule for function application v{vi, . . . ,Vn) is the rule one 
would expect. In general, v will be a value of the form v'[ci] ■ ■ ■ [c„] where v' 
is a function polymorphic in locations and constraints and the type construc- 
tors Cl through c„ instantiate its polymorphic variables. After substituting ci 
through c„ for the polymorphic variables, the current constraints must equal 
the constraints expected by the function v. This check guarantees that the no- 
duplication property is preserved across function calls. To see why, consider the 
polymorphic function foo where the type context A is {pi,p2jc) and the con- 
straints C are e© {pi 1— >■ {int),p2 >— >■ {int)}: 

f±xfoo[A; C; x:ptr{pi) , y:ptr{p2) , cont:\/[-; e].{int)^0]. 



free x; 


(* 


constraints = e © {p2 >— >■ {int)} 


*) 


0, 

II 


(* 


ok because y : ptr{p2) and {p2 1— >■ 


{int)} *) 


free y; 


(* 


constraints = e 


*) 


cont{z) 


(* 


return/ continue 


*) 
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This function deallocates its two arguments, x and y, before calling its continua- 
tion with the contents of y. It is easy to check that this function type-checks, 
but should it? If foo is called in a state where pi and p 2 are aliases, a run-time 
error will result when the second instruction is executed because the location 
pointed to by y will already have been deallocated. Fortunately, our type system 
guarantees that foo can never be called from such a state. 

Suppose that the store currently contains a single integer reference: {£ i— >■ 
(3)}. This store can be described by the constraints {£ >->■ (int)}. If the program- 
mer attempts to instantiate both pi and p 2 with the same label £, the function 
call/oo[£, t', 0](ptr(£)) will fail to type check because the constraints {£ {int)} 
do not equal the pre-condition 0 © {£ >->• {int),£ (int)}. 

Figure 3 contains the typing rules for values and instructions. Note that the 
judgement A \-^f r indicates that A contains the free type variables in r. 

3.4 Soundness 

Our typing rules enforce the property that well-typed programs cannot enter 
stuck states. A state (S', l) is stuck when no reductions of the operational seman- 
tics apply and t. yf halt . The following theorem captures this idea formally: 

Theorem 1 (Soundness) // h S : C and qC;- \~^ l and (S,l) i — >■ ... i — >■ 
(S', t') then (S', d) is not a stuck state. 

We prove soundness syntactically in the style of Wright and Felleisen [28]. 
The proof appears in the companion technical report [19]. 

4 Non-linear Constraints 

Most linear type systems contain a class of non-linear values that can be used 
in a completely unrestricted fashion. Our system is similar in that it admits 
non-linear constraints, written {rj i— >• t}‘^. They are characterized by the axiom: 

Z\ h {?7 H> r}‘^ = {?? i-T r}“ © {?7 r}“ 

Unlike the constraints of the previous section, non-linear constraints may be 
duplicated. Therefore, it is not sound to deallocate memory described by non- 
linear constraints or to use it at different types. Because there are strictly fewer 
operations on non-linear constraints than linear constraints, there is a natural 
subtyping relation between the two: {y t} < {y r}“. We extend the 
subtyping relationship on single constraints to collections of constraints with 
rules for reflexivity, transitivity, and congruence. For example, assume add has 
type y[pi,p 2 ,e;{pi © {p 2 © e].{ptr{pi),ptr{p 2 ))-i -0 and 

consider this code: 

Instructions Constraints (Initially 0) 

malloc a;, p, 1; Ci = {p (junk)} , x : ptr{p) 

x[0]:=3; C 2 = {p H> (int)} 

add[p, p, 0](x, x) C 2 < {p !->■ (mt)}“ = {p © {p !->■ © 0 

Typing rules for non-linear constraints are presented in Figure 4. 
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Z\; -T n : T 



Z\; -T i : int A; F x : F{x) 



A; r junk : junk 



A hu,/ 77 A; r \-^ Vi : Ti ■ ■ ■ A; r \-y v„ : 

A; r h„ ptr( 77 ) : ptr{ri) A; F \~^ (vi,. . . ,v„) : (n, . . . , r„) 

Ah^fV[A';C].(Tl,...,Tn)^0 
A, A'; C; F, /;V[Zi'; G].(ri, . . . , r„)->-0, xi:n, . . . ,a:„:r„ \~, i 

Zi; r f ix/[zl'; G;o;i:ti, . . . ,a;„:r „].4 : V[Z\'; C].(ti, • • • ,T„)->0 ^ 

A 77 A-,F\-y V. V[p, Z\'; C'].(n, . . . , r„)->-0 

Z\; r h„ 77 ( 77 ] : V[Z\'; G].(ti, . . . , r„)-s-0[77/p] 



A\-^f C Z\; F 7; : V[e, Zi; G^].(ri, . . . , r„)— 7-0 A] F \~ v t' A\~t' = t 

Z\;FK t;[C] :V[Zi;G'].(ri,...,r„)^ 0 [G/£] Z\; F h„ 77 : t 

Z\;G;F h, L 



Z\, p; C 0 {p i-» {junk^,. . . ,junk„)}-, F, x:ptr{p) h, t 
Z\; G; F hi malloc 07, p, 77 ; 7 



{x^F,p^A) 



Zl; F 77 : ptr{rj) 

Z\ h G = G' © {p !->• (n, . . . , r„)} Zi; G' © {p !->• jMTrfc}; F i 
Zi; G; F hi free 77; t 



Zi; F h„ 77 : ptr(p) Zi h G = G' © {p i-T- (n, . . . , r^)} 

Zi; F ht, 77' : r' Zi; G' ® {p i-T- (ri, . . . , Ti_i, r', Ti+i, . . . , r„)}; F i 

Zi;G;F h, 77[7]:=77';7 



(1 < 7 < 77) 



Zi; F h„ 77 : ptr(p') 

Zi h G = G' ® {p' 1 -^ (n, . . . , T„)} Zi; G; F, a77Ti hi t / x ^ F \ 
A; G; F hi 07=77(7]; 7 



Zi; F h„ 77 : V(-; G'].(n, . . . , Tn)^0 Zi h G = G' 

zi; F h i; 77l 7 Tl • • • A] F \~ '0 Vn 7 Tn 

Zi; G; F hi 77(771, ... , 77 „) Zi;G;Fhihalt 



Fig. 3. Language of Locations7 Value and Instruction Typing 
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A-,r\~v V ■. ptr{rj) 

A'r c = C ®{n^ A-c-,r,x-.Ti'r,L / \ 

A; C\ r hi x=v[i]\ b 1^1 < * < 



A;F \~v V ■. ptr{rj) A-, _T h„ u' : r' 

Ah C = C ®{v^ {Ti,...,Tn)r AhT' = Ti A-C-,rh,i 
A;C;r hi v[{\:=v'-,L 



(1 < i < n) 



Z\; r h„ u : V[-; C'].(ti, . . . , r„)^0 A h C < C' 

A\ T h u ^1 : Tl • • • A] r \~ y Vn • Xn 

A- C; r hi v{vi,. . . ,Vn) 



h S:C hC <C 
h S : C 



Fig. 4. Language of Locations: Non-linear Constraints 



4.1 Non-linear Constraints and Dynamic Type Tests 

Although data structures described by non-linear constraints cannot be deal- 
located or used to store objects of varying types, we can still take advantage 
of the sharing implied by singleton pointer types. More specifically, code can 
use weak constraints to perform a dynamic type test on a particular object and 
simultaneously refine the types of many aliases of that object. 

To demonstrate this application, we extend the language discussed in the 
previous section with a simple form of option type ?(ti, . . . ,r„) (see Figure 5). 
Options may be null or a memory block (ti,...,t„). The mknull operation 
associates the name p with null and the tosum v, t instruction injects the value v 
(a location containing null or a memory block) into a location for the option type 
?(ti, . . . , r„). In the typing rules for tosum and if null, the annotation (j) may 
either be w, which indicates a non-linear constraint or •, the empty annotation, 
which indicates a linear constraint. 

The ifnull v then ti else L 2 construct tests an option to determine whether 
it is null or not. Assuming v has type ptr{r]), we check the first branch (ii) 
with the constraint {p i— >■ null}‘^ and the second branch with the constraint 
{p :— >■ (ti, . . . , Tn)}^ where (ti, . . . , r„) is the appropriate non-null variant. As 
before, imagine that sp is the stack pointer, which contains an integer option. 

(* constraints = {p {ptr{p')),p' i-h ?(int)} , sp\ptr{p) *) 

ri=sp[l]; (* ri'.ptr{p') *) 

ifnull ri then halt (* null check *) 

else ••• (* constraints = {p ^ {ptr{ri'))'\(B{p' '-h *) 

Notice that a single null test refines the type of multiple aliases; both r\ and 

its alias on the stack sp[l] can be used as integer references in the else clause. 
Future loads of r\ or its alias will not have to perform a null-check. 
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These additional features of our language are also proven sound in the com- 
panion technical report [19]. 



Syntax: 



... I ?(ti, ...,Tn) I null 

... I null 

... I mknull x,p;t\ tosum n, ?(n, . . . , Tn) \ 
ifnull V then else L2 

Operational semantics: 



types T ::= 

values V ::= 

instructions i ::= 



{S, mknull X, p; t) 
where I ^ S 

(S', tosum t, ?(n, . . . ,r„); t) 

{S{£ I— >■ null}, 

ifnull ptr(t) then ti else 1 , 2 ) 
{S{£ ^ {vi, . . . , v„)}, 

ifnull ptr(£) then ti else L 2 ) 



(S{£ !->■ null}, 4[t'/p][ptr(t')/a;]) 

(S,6) 

{S{£ I— >■ null}, ti) 

(S{£i->- {vi, . . . ,v„)},t 2 ) 



Static Semantics: 



A; r null : null 



Z\, p; C © {p !->■ null}-, r, x:ptr{p) hi t 
A;C-,r\~i mknull x, p; L 



(x^r,p^A) 



A-,r\-^v: ptr{ri) Ah C = C' ®{v^ null}^'’ 

A h^f ?(n, . . . , T„) Zl; C" © {77 !->• ?(n, . . . ,r„)}‘^; F h, i 

Z\; G; r hi tosum n, ?(n, . . . , t„); t 



A;FhvV: ptr{rj) 

AhC = C'®{ve^{Ti,..., r„)}-^ Z\; G' © {p ?(ri , . . . , r„)}-^; F hi L 

A-, G; F hi tosum n, ?(n, . . . , Tn); t 



A;FhyV: ptr{rj) Z\ h G = G' © {p i->- ?(n, . . . , Tn)}"^ 

Z\; G' © {p nwZ/}'^; T hi ti Z\; G' © {p (n, . . . , Tn)}"^; F hi L 2 
A; G; F hi ifnull v then ti else 1,2 



Fig. 5. Language of Locations: Extensions for option types 



5 Related and Future Work 

Our research extends previous work on linear type systems [26] and syntactic 
control of interference [16] by allowing both aliasing and safe deallocation. Se- 
veral authors [26, 3, 9] have explored alternatives to pure linear type systems 
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to allow greater flexibility. Wadler [26], for example, introduced a new let-form 
let ! (x) y = e\ in 62 that permits the variable x to be used as a non-linear 
value in Cl {i.e. it can be used many times, albeit in a restricted fashion) and 
then later used as a linear value in 62 - We believe we can encode similar behavior 
by extending our simple subtyping with bounded quantification. For instance, 
if a function / requires some collection of aliasing constraints e that are boun- 
ded above by {p\ >->■ (znt)}‘^ © {p 2 i-T then / may be called with a 

single linear constraint {p 1 — >■ (mt)} (instantiating both p\ and p 2 with p and 
e with {p I— >■ (znt)}). The constraints may now be used non-linearly within the 
body of /. Provided / expects a continuation with constraints e, its continuation 
will retain the knowledge that {p 1 — >■ (znt)} is linear and will be able to deallo- 
cate the storage associated with p when it is called. However, we have not yet 
implemented this feature. 

Because our type system is constructed from standard type-theoretic building 
blocks, including linear and singleton types, it is relatively straightforward to 
implement these ideas in a modern type-directed compiler. In some ways, our new 
mechanisms simplify previous work. Previous versions of TAL [12, 11] possessed 
two separate mechanisms for initializing data structures. Uninitialized heap- 
allocated data structures were stamped with the type at which they would be 
used. On the other hand, stack slots could be overwritten with values of arbitrary 
types. Our new system allows us to treat memory more uniformly. In fact, our 
new language can encode stack types similar to those described by Morrisett 
et al. [11] except that activation records are allocated on the heap rather than 
using a conventional call stack. The companion technical report [19] shows how 
to compile a simple imperative language in such a way that it allocates and 
deletes its own stack frames. 

This research is also related to other work on type systems for low-level 
languages. Work on Java bytecode veriflcation [20, 8] also develops type systems 
that allows locations to hold values of different types. However, the Java bytecode 
type system is not strong enough to represent aliasing as we do here. 

The development of our language was inspired by the Calculus of Capa- 
bilities (CC) [4]. CC provides an alternative to the region-based type system 
developed by Tofte and Talpin [24]. Because safe region deallocation requires 
that no aliases be used in the future, CC tracks region aliases. In our new lan- 
guage we adapt CC’s techniques to track both object aliases and object type 
information. 

Our work also has close connections with research on alias analyses [5, 21, 
17]. Much of that work aims to facilitate program optimizations that require 
aliasing information in order to be correct. However, these optimizations do not 
necessarily make it harder to check the safety of the resulting program. Other 
work [7, 6] attempts to determine when programs written in unsafe languages, 
such as C, perform potentially unsafe operations. Our goals are closer to the 
latter application but differ because we are most interested in compiling safe 
languages and producing low-level code that can be proven safe in a single pass 
over the program. Moreover, our main result is not a new analysis technique. 
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but rather a sound system for representing and checking the results of analysis, 
and, in particular, for representing aliasing in low-level compiler-introduced data 
structures rather than for representing aliasing in source-level data. 

The language of locations is a flexible framework for reasoning about sharing 
and destructive operations in a type-safe manner. However, our work to date is 
only a first step in this area and we are investigating a number of extensions. In 
particular, we are working on integrating recursive types into the type system as 
they would allow us to capture regular repeating structure in the store. When 
we have completed this task, we believe our aliasing constraints will provide us 
with a safe, but rich and reusable, set of memory abstractions. 
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Abstract. The basic idea behind improving the quality of a monovariant control 
flow analysis such as OCFA is the concept of polyvariant analyses such as Agesen’s 
Cartesian Product Algorithm (CPA) and Shivers’ nCFA. In this paper we deve- 
lop a novel framework for polyvariant flow analysis based on Aiken-Wimmers 
constrained type theory. We develop instantiations of our framework to formalize 
various polyvariant algorithms, including nCFA and CPA. With our CPA forma- 
lization, we show the call-graph based termination condition for CPA will not 
always guarantee termination. We then develop a novel termination condition and 
prove it indeed leads to a terminating algorithm. Additionally, we show how data 
polymorphism can be modeled in the framework, by defining a simple extension 
to CPA that incorporates data polymorphism. 



1 Introduction 

The basic idea behind improving the precision of a simple control flow analysis such 
as OCFA is the concept of polyvariant analysis, also known as flow splitting. For better 
analysis precision, the definition of a polymorphic function is re-analyzed multiple times 
with respect to different application contexts. The original polyvariant generalization 
of the monovariant OCFA control flow algorithm is the nCFA algorithm, defined by 
Shivers [17]. This generalization however has been shown to be not so effective: the 
values of n needed to obtain more accurate analyses are usually beyond the realm of the 
easily computable, and even ICFA can be quite slow to compute [19]. Better notions of 
poly variant analysis have been developed. In particular, Agesen’s CPA [1,2] analyzes 
programs with parametric polymorphism in an efficient and adaptive manner. 

In this paper we develop a general framework for polyvariant flow analysis with 
Aiken-Wimmers constrained types [3]. We represent each function definition with a po- 
lymorphic constrained type scheme of form {\/ t.t ^ t \ C). The subtyping constraint 
set C bound in the type scheme captures the flow corresponding to the function body. 
Each re-analysis of the function is realized by a new instantiation of the type scheme. 

There have recently been several frameworks developed for poly variant flow analysis, 
in terms of union and intersection types [16], abstract interpretation [13], flow graphs 
[12], and more implementation-centric [10]. Our purpose in designing a new framework 
is not primarily to give “yet another framework” for polyvariant flow analysis, but to 
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develop a framework particularly useful for the development of new poly variant analyses, 
and for improving on implementations of existing analyses. We will give an example of a 
new analysis developed within the framework, Data- Adaptive CPA, which extends CPA 
to incorporate data polymorphism. There also are implementation advantages obtained 
by basing analyses on polymorphic constrained types. Compared to the flow graph based 
approach used in other implementations of flow analyses [2, 10, 14], our framework 
has several advantages: using techniques described in [8, 15], constrained types can be 
simplified on-the-fly and garbage collection of unreachable constraints can be performed 
as well, leading to more efficient analyses; and, re-analysis of a function in a different 
polyvariant context is also realized by instantiation of the function’s constrained type 
scheme, and does not require re-analysis of the function body. 

This paper presents the first proposal to use constrained type schemes to model 
polyvariance; there are several other related approaches in the literature. Palsberg and 
Pavlopoulou [ 1 6] develop an elegant framework for polyvariant analyses in a type system 
with union/intersection types and subtyping. There are also subtype-free type-based 
realizations of polymorphism which can be adapted to polyvariant flow analysis. Let- 
polymorphism is the classic form of polymorphism used in type inference for subtype- 
free languages, and has been adapted to constrained types in [3, 7], as well as directly 
in the flow analysis setting by Wright and Jagannathan [19]. Another representation of 
polymorphism found in subtype-free languages is via rank-2 intersection types [11], 
which has also been applied to poly variant flow analysis [4]. The Church group has 
developed type systems of union/intersection types decorated with flow labels to indicate 
the flow information [18]. 

The form of polyvariance we use is quite general: we show how CPA, nCFA, and 
other analyses may be expressed in the framework. A V type is given to each function in 
the program, and for every different call site and each different type of argument value 
the function is applied to, a new contour (re-analysis of the function via instantiation 
of the V type) is possible. The framework is flexible in how contours are generated: a 
completely new contour can be assigned for an particular argument type applied to the 
function, or for that argument type it can share a pre-existing contour. For example, 
OCFA is the strategy which uses exactly one contour for every function. 

One difficult problem for CPA is the issue of termination: without a termination 
check, the analysis may loop forever on some programs, producing infinitely many con- 
tours. We develop a termination condition which detects a certain kind of self-referential 
flow in the constraints and prove that by merging some contours in this case, non- 
termination is prevented and the analysis is implementable. Our termination condition is 
different from the call-graph based condition commonly used in other algorithms, which 
we show will not guarantee termination in all cases. 

We also aim here to model polyvariant algorithms capable of handling data polymor- 
phism: the ability of an imperative variable to hold values of different types at run-time. 
Data polymorphism arises quite frequently in object-oriented programming, especially 
with container classes, and it poses special problems for flow analysis. The one precise 
algorithm for detecting data polymorphism is the iterative flow analysis (IFA) of Plevyak 
and Chien [14]. We present a simple non-iterative algorithm. Data- Adaptive CPA, based 
on an approach distinct from that of IFA. 
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2 A Framework for Poly variant Flow Analysis 

This section presents the framework of polyvariant constrained type inference. In the 
next section we instantiate the framework for particular analyses. 

2.1 The Language 

The language we study here is an extension to the language used in Palsberg and Pavlo- 
poulou’s union/intersection type framework for flow analysis [16], adding mutable state 
so we can model data polymorphism. We believe the concepts of current paper should 
scale with relative ease to languages with records, objects, classes, and other features, 
as illustrated in [7, 6]. 

Definition 21 (The language): 

e = X I n I succe | ifOeee | Ax.e | ee | new |e: = e| !e|e; e 

This is a standard call-by-value lambda calculus extended with reference cells. Exe- 
cution of a new expression creates a fresh, uninitialized reference cell. We use new 
because it models the memory creation mode of languages like Java and C++, where 
uninitialized references are routinely created. Recursive definitions may be constructed 
in this language via the V -combinator. 



2.2 The Types 

Our basis is an Aiken- Wimmers-style constraint system [3] ; in particular it is most closely 
derived from the system described in [7], which combines constraints and mutable state. 
Definition 22 (Types): The type grammar is as follows. 

T € Type ::= t I ™ I read t | write r | ti — >■ t2 

t G TypeVar D ImpTypeVar 

u £ ImpTypeVar 

t £ TypeVarSet = Pfln (TypeVar) 

TV £ ValueType ;:= int | (V t. t —>■ t \ C) \ ref u 

Ti <: T2 £ Constraint 

C £ ConstraintSet = P„ (Constraint) 

The types for the most part are standard. Function uses (call sites) are given type 
t\ t 2 - ValueType are the types for data values, ref u is the type for a cell whose 
content has type u. We distinguish imperative type variables u £ ImpTypeVar for 
the presentation of data polymorphism. Read and write operations on reference cells are 
represented with types read t and write r respectively. Functions are given polymor- 
phic types {\/ t. t ^ T \ C), where t is the type variable for the formal argument, r 
is the return type, C is the set of constraints bound in this type scheme, and t is the set 
of bound type variables. Such types are also referred as V types or closure types in the 
paper. 
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(Var) 

(Int) 

(Succ) 

(IfD) 

(Abs) 



A{x) = t 
A h a; : i \ {} 



A n : int \ {} 

A \- e : t\C 

A h succe : int \ {r <; int} U C 

A ei : Ti \ Cl, 62 : T2 \ C2, A h 63 : rs \ C3 

A h ifO 6i 62 63 : t\ {ri <: int, T2 <: t, T3 <: t} U Ci U C2 U C3 

A, {a; : t} h 6 : r \ C 

A h Ax. 6 : (y 1. 1 T \ C) \ {} 

where t = FreeTypeVar{t t \ C) — FreeTypeVar(A) 



(Appl) 



A h 61 : n \ Cl, 62 : T2 \ C2 

Ah 61 62 : t2 \ {ri <: ti — >■ T2 <: ti} U Ci U C2 



(New) 

(Read) 

(Write) 

(Seq) 



A h new ; ref w \ {} 

A h 6 : r \ C 

A h l6 : t \ {r <: read t} 

A h 61 : n \ Cl, 62 : T2 \ C2 

A h 61 := 62 : T2 \ {ti <: write T2} U Ci U C2 
A h 61 : n \ Cl, 62 : T2 \ C2 
A h 61 ; 62 : T2 \ Cl U C2 



Fig. 1. Type inference rules 



2.3 The Type Inference Rules 

We present the type inference rules in Figure 1. A type environment A is a mapping 
from program variables to type variables. Given a type environment A, the proof system 
assigns a type to expression e via the type judgment A h e : r \ C, where r is the 
type for e, and C is the set of constraints which models the flow paths in e. We abbreviate 
A h e : t\C' as h e : t \ C when A is empty. The rules are deterministic 
except that nondeterminism may arise in the choice of type variables. We restrict type 
derivations to be of a form where fresh type variables are used whenever it is possible. 
With this restriction, type inference is trivially decidable and is unique modulo choice 
of type variable names. 

Definition 23 (Type inference algorithm): For closed expression e, its inferred type 
is T \ C provided h e : t\C. 

The intuition behind those inference rules is that a subtyping constraint ti <: T2 
indicates a potential flow from expressions of type ti to expressions of type T 2 . The 
rules generally follow standard presentations of Aiken- Wimmer constrained type system, 
except for the (Abs) and cell typing rules. Detailed descriptions of other rules could be 
found in [7, 3]. The (Abs) rule assigns each function a polymorphic type (V t. f — ?> 
T \ C). In this rule, FreeTypeVar(-) is a function that extracts free type variables, t 
collects all the type variables generated when the inference is applied to the function 
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body, and C collects all the constraints corresponding to the function body. The manner 
in which V type schemes are formed is similar to standard polymorphic constrained type 
systems, but the significant difference here is that every function is given a V type. By 
contrast, in a system based on let-polymorphism, the let construct dictates where V 
types are introduced. 

The (New) rule assigns the reference cell type ref u, with u, the type of the cell 
content, initially unconstrained. In the (Read) rule, read t is the type for a cell whose 
read result is of type t. In the (Write) rule, write T2 is the type for a cell assigned with 
a value of type T2. 

We take an intensional view of types: two types are equivalent if and only if they are 
syntactically identical. In particular, V types corresponding to different functions in the 
program are always different, even though they might be a-variants. This is because we 
wish to distinguish different functions in the analysis to obtain precise flow properties. 
For type soundness properties, an extensional view could be taken. 

We illustrate the inference rules with the example studied in [ 16 ]: 

El = (A/.succ ((/ /) 0) (if 0 n (\x.x) {\y.\z.z)) 

To ease presentation, each program variable is inferred with a type variable having 
exactly the same name. We have 

h (A/.succ ((//) 0 ) : T/\{}, where r/ = (V {/,fi,f2,f3, if}-/ ^ int \ {/ <: 
h,f <■ <■ h f4,int <: ^3,^4 <: int}), 

h {Xx.x) : r^\{}, where = (V {a;}. a; a:\{}), 
h {\y.\z.z) : Ty\{}, where Ty = (V {y}.j/-^ (V {z}.2 2;\{})\{}). 

\- El : tr\ {int <: int,Ta; <: t5,Ty <: U,Tf <: U <■ h} 

2.4 Computation of the Closure 

The inference algorithm applied to program e results in a type judgment h e : t\C. 
For a flow analysis, we need to generate all the possible data-flow and control-flow paths 
and propagate value types along all the data-flow paths. This is achieved by applying 
the closure rules of Figure 2 to C, propagating information via deduction rules on the 
subtyping constraints. 

The rule (Trans) is the transitivity rule which models run-time data flow by propa- 
gating value types forward along flow paths. The (Read) closure rule applies when a 
read operation is applied on a cell of type ref u, and the reading result is of type t. By 
constraint u <: t, the cell content flows to the reading result. The (Write) closure rule 
applies when a write operation is applied on a cell of type ref u, and a value of type t 
is assigned to the cell. By constraint t <: t, the value flows to the content of the cell. 
With (Read) and (Write) rules together, any value assigned to a cell flows fo the cell’s 
reading result. Flanagan [ 9 ] uses a related set of rules for references and was the source 
of the idea for us. 

The most important closure rule is (Fun), which performs V elimination. The con- 
straint (V I. f T \ C) <: t2 indicates a function flowing to a call site, where 

(V /. f T \ C) is the type for the function and ti — ^2 is the type representing the 
call site. The constraint tv <: ti means that a value of type tu flows in as the actual 
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(Trans) 

(Fun) 



(Read) 

(Write) 



TV <: t, t <: T 
TV <: T 

(V i. 7 — >■ T \ C) <: — >■ t2, TV <: ti 

TV <: 0{t), 0 {t) <: 72, 0(C) 

where 0 = Poly((V t. 7 — >■ t \ C) <: — >■ 72, tv) 

ref u <: read t 

u <: t 

ref u <: write r 

T <: u 



Fig. 2. Constraint Closure rules 



argument. At run-time, upon eaeh funetion applieation, all loeal variables of the fune- 
tion are alloeated fresh loeations on the staek. To model this behaviour in the analysis, 
a renaming 0 G TypeVar A TypeVar is applied to type variables in I. The partial 
funetion 0 is extended to types, eonsfraints, and constraint sets in the usual manner. 
0(t) for t is defined to return t. We call 0{t) an instantiation of r. Following the 
terminology of Shivers’ nCFA [17], we call a renaming 0 a contour. The V is eliminated 
from {\/ t. t ^ T \ C) by applying 0 to C. The (Fun) rule then generates additional 
constraints to capture the flow from the actual argument tu to the formal argument 0{t), 
and from the return value 0{t) to the application result t 2 - The (Fun) rule is parameteri- 
zed by function Poly G Constraint x ValueType -A (TypeVar A TypeVar), 
which decides for this particular function, call site, and actual argument type, which 
contour is to be used (i.e., created or reused). Providing a concrete Poly instantiates the 
framework to give a concrete algorithm. For example, the monovariant analysis OCFA 
is defined by letfing Poly always return the identity renaming. This particular example 
shows how Poly may reuse existing contours. The differing analyses are defined by 
differing Poly which use different strategies for sharing contours. In the next section 
we show how this works by presenting some particular Poly. 

Definition 24 (Closure): For a constraint set C, Closurepoiy(C) is the least superset 
of C closed under the closure rules of Figure 2. 

This closure is well-defined since the rules can be seen to induce a monotone function 
on constraint sets. By this definition, some Poly may produce infinite closures since 
infinitely many contours may be created. Such analyses are still worthy of study even 
though they are usually unimplementable. 

Definition 25 (Flow Analysis): Define Analysisp^iy(e) = C'fosrtrepoiy(C'), where the 
inference algorithm infers h e : t \C. 

The output of an analysis is a set of constraints, which is the closure of the constraint 
set generated by the inference rules. The closure contains complete flow information 
about the program, various program properties can be deduced from it. 
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Definition 26 (Type-Checking): A program e is well-typed iff AnalysiSf^-^y{e) con- 
tains no immediately type-contradictory constraints such as ref u <: t ^ t'. 

For example, analyzing program succ (Xx.x) would generate a type-contradictory 
constraint (V {x}.x tc\{}) <: int, which indicates an type error. A computation 

state is wrong if computation cannot continue due to a type error. Our type system does 
not statically check for errors due to reading uninitialized cells. 

To illustrate how the results of a conventional control flow analysis can be obtained 
in our framework, we use the fact that by the structure of the inference rules, every V 
type in the closure corresponds to a unique lambda abstraction in the program. 

Definition 27 (Control Flow Analysis): For an expression e in the program, if e is 
assigned with type r by the inference rules, the function corresponding to (V t' . t' ^ 
t' \ C) is considered flowing to e, if either r = (V t' . t' ^ t' \ C) or (V t' . t' — >■ 
r' \ C") <: t £ Analysisp^iy{e), and either t = t or t is an instantiation of t. 

The above definition includes two cases: either e is directly assigned with a V type, 
in this case e is a lambda abstraction which trivially flows to itself; or e is assigned with 
a type variable by the inference rules, and the type variable or an instantiation of it has 
a V type as lower bound. 

A subject reduction property for our type system can be established, with a proof 
similar to the one in [7]. The subject reduction property implies the type soundness and 
flow soundness of the framework. 

Theorem 28 (Subject Reduction, Type Soundness, Flow Soundness): 1. The type 

system has a subject reduction property; 

2. A well-typed program e cannot go wrong during execution; 

3. If an expression evaluates to a closure value of a function, the function is considered 

flowing to the expression by the the control flow analysis. 

The soundness of the framework implies that any analysis defined as an instantiation 
of the framework is also sound. 



3 Instantiating the Framework 

In this section we present various poly variant algorithms as instantiations of our frame- 
work. 



3.1 nCFA Instantiation 

In Shivers’ nCFA analysis [17], each function application (call) is associated with a 
call-string of length at most n. The call-string contains the last n or fewer calls on the 
call-path leading to this application. Applications of the same function share the same 
contour (i.e., analysis of the function) if they have the same call-string. To present nCFA 
in our framework, type variables are defined with superscripts that denote the call-string: 



a € Identifier 

s £ Superscript = Identifier List 
t € TypeVar ::= a® 
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We use the following list notation: The empty list is [ ], [a\, . . . , am] is a list of m 
elements, h @ I 2 appends lists h and I 2 , and l{l..n) is the list eonsisting of the first 
min{n, length{l)) elements of list 1. Eaeh type variable a® is tagged with a eall-string 
s. All type variables generated by the inference rules have empty lists as superscripts. 
By the inference rule (Appl), a call site is inferred with a type a[^ we use «2 to 

identify this call site, thus a call-string is a list of such identifiers. All bound type variables 
of a V type have empty list superscripts. When the V quantifier is eliminated by the (Fun) 
closure rule, those bound type variables are renamed by changing the superscripts from 
empty lists to the appropriate call-strings. 

Definition 31 (nCFA Algorithm): The nCFA algorithm is defined as the instantiation 
of the framework with Poly = CFA, where 

CFA((V 1. 1 — >■ r \ C) <: ti — >■ , tv) = 0, where for each ' €t, 

0{a^^) = a® , where s' = ([02] @ S2)(l..n) 

It can be shown by induction that s' is the call-string for application {\/ t. t ^ 
T \ C) <: ti ^ . The definition of O ensures that applications of the same function 

share the same contour if and only if they have the same call-string. 

Not only is nCFA inelficient, but even for large n it may be imprecise. Applying 
nCFA to program Ei, since (A/ . . . ) has only one application, the (Fun) rule generates 
only one contour 0 for this function, resulting in Tx C 0{f) and Ty <: 0{f). This 
means both (Xx.x) and (Xy.Xz.z) flow to /, and at the application site // there are 
four applications. One of them, (Xx.x) applying to {Xy.Xz.z) leads to a type error: 
(V {z}.z ^\{}) <: int. Hence nCFA fails to type-check Ei for arbitrary n. 

3.2 Idealized CPA 

The Cartesian Product Algorithm (CPA) [ 1 , 2] is a concrete type inference algorithm 
for object-oriented languages. For a message sending expression, CPA computes the 
cartesian product of the types for the actual arguments. For each element of the cartesian 
product, the method body is analyzed exactly once with one contour generated. The 
calling-contexts of a method are partitioned by the cartesian product, rather than by 
call-strings as in nCFA. In our language, each function has only one argument. For each 
function, CPA generates exactly one contour for each distinct argument type that the 
function may be applied to. Without a termination check, CPA may fail to terminate 
for some programs. We first present an idealized CPA which may produce an infinite 
closure, and in Section 5 show how a terminating CPA analysis may be defined which 
keeps the closure finite. To present CPA, type variables are defined with structure: 

a G Identifier 
t G TypeVar ::= a \ a™ 

The inference rules are constrained to generate type variables without superscripts. 

Definition 32 (Idealized CPA algorithm): The Idealized CPA algorithm is the instan- 
tiation of the framework with Poly = CPA, where 

CPA((V 1. 1 ^ T \ C) <: ti ^ t 2 , tv) — 0, where for each a Gt, 0{a) = a™ 
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The contours O are generated based on the actual argument type tv, independent of 
the application site t\ t 2 - This is the opposite of CFA, which ignores the value type 
TV, and only uses the call site ti ^ t2- Given a particular function and its associated V 
type in a program, this algorithm will generate a unique contour (V elimination) for each 
distinct value type the function is applied to. It however may share contours across call 
sites. Agesen [2] presents convincing experimental evidence that the CPA approach is 
both more efficient and more feasible than nCFA. 

We now sketch what Idealized CPA will produce when applied to program Ei. Even 
though there is only one application site for (A/ . . . ), it applies to two different actual 
argument values. So, the (Fun) rule generates two contours 0i and 6*2 for (A/ ... ) with 
0r(/) = r^,Tx <: 02 (f) = r\Ty <■. At application site //, there would 

be only two applications: (Xx.x) applying to itself and (Xy.Xz.z) applying to itself Thus 
the program is type-checked successfully. 



4 Data Polymorphism 

Data polymorphism is defined informally in [2] as the ability of an imperative program 
variable to hold values of different types at run-time. In our language, a more precise 
definition could be that cells created from a single imperative creation point (new ex- 
pression) in the program could be assigned with run-time values of different types. CPA 
addresses parametric polymorphism effectively, but may lose precision in the presence 
of data polymorphism. For example, consider when CPA is applied to the program 

F 2 = (A/. (Ax. X : = 0; succ !x)(/ 1); (/2) :={Xy.y)) (Xz.new) 

Function (Xy.y) has type (V {y}.y — >■ J/\{}), and (Az.new) has type (V {z,u}.z — >■ 
ref u\{}). The two applications of (Az.new) have same actual argument type int, so one 
contour 0 is shared by the two applications. At run-time the two applications return two 
distinct cells, but in CPA closure, the two cells share type ref u' (assume 0{u) = u'), 
since there is only one contour for (A^.new). At run-time, one cell is assigned with 
0, and the other is assigned with (Xy.y). The two assignments are both reflected on 
u' as constraints int <: u' and (V {y}.y — >■ y\{}) <' u' , as if there were only one 
cell, which is assigned with values of two different types. This leads to a type error: 
^ {y}-y 2 /\{}) <• • But if distinct contours were used for the two applications 

of (Az.new), the two cells would have separate cell types and the program would be 
type-checked. 

This small example illustrates that data polymorphism is a problem that arises in a 
function that contains a creation point (a new expression). Different applications of the 
function may create different cells which are assigned with values of different types, a 
precise analysis should disambiguate these cells by letting them have separate cell types. 
To illustrate how data polymorphism can be modeled in our framework, we present a 
refinement of CPA to give better precision in the analysis of data polymorphic programs. 

Consider two applications of a single function. If the applications have same actual 
argument type, then CPA generates a single contour for them. But, if the two applications 
return separate mutable data structures at run-time, and the data structures are modified 
differently after being returned from the two different applications, CPA would lose 
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precision by merging the data structures together. If two separate contours were used for 
the two applications, the imprecision could be avoided. In the result of CPA analysis, 
such a function has a return type which is a mutable data structure with polymorphic 
contents. We call such functions data-polymorphic. In program E 2 , (Az.new) is data- 
polymorphic, and the other functions are data-monomorphic. 

Based on the above observation, our Data- Adaptive CPA algorithm is a two-pass ana- 
lysis. The first pass is just CPA. From the CPA closure, we detect a set of functions which 
are possibly data-polymorphic. In the second pass, for data-polymorphic functions, a 
distinct contour is generated for every function application. In this way, imprecision asso- 
ciated with data-polymorphic functions can be avoided; only CPA splitting is performed 
for data-monomorphic functions, avoiding generation of redundant contours. 

Mutable data structures with polymorphic contents are detected from the CPA closure 
with following definition: 

Definition 41 (Data Polymorphic Types): Type t is data-polymorphic in constraint 
set C if any of the following cases hold: 

1. T = ref u, TVi <: u € C, tv 2 <: u € C, and rui ^ TV 2 ', 

2. T is type variable t,r' <: t € C, and r' is data-polymorphic in C; 

3. T = ref u and u is data-polymorphic in C; 

4. T = {y t' . t' ^ t' \ C') and r' is data-polymorphic in C. 

The above definition is inductive. The first case is the base case, detecting cell types 
with polymorphic contents. The second case declares a type variable as data-polymorphic 
when it has a data polymorphic lower bound. The remaining two cases are inductive cases 
based on the idea that a type is data-polymorphic if it has a data-polymorphic component. 
Particularly, a closure type is declared as data-polymorphic when the type of its return 
value is data-polymorphic. Note that, for purely functional programs with no usage of 
cells, no types would be detected as data-polymorphic. 

Recall that CPA type variables are either of the form a or a™ . We define an operation 
erase on type variables as: erase(a) = a, erase (a™) = a. And we extend it naturally 
to types, defining erase{r) as the type with all superscripts erased from all type variables 
in T. In particular, erase maps a closure type to the type for the lambda abstraction in 
the program corresponding to the closure type, and it maps a cell type to the type for the 
corresponding creation point (new expression) in the program. From now on, we call tu 
an instantiation of erase(rv). 

Definition 42 (Data Polymorphic Fnnctions): For function \x.e assigned with type 
(V t. t T \ C) by the inference rules, \x.e is a data-polymorphic function in 
constraint set C iff there appears r' in C s.t. erase{r') = r and r' is data-polymorphic 
inC". 

In the above definition we use the fact that every distinct function in the program is 
given a unique type (V f . f — r \ C) by the inference rules. The constraint set C is a 
flow analysis result of the program. The condition erase{r') = r means that the function 
Xx.e may return a value of type r'. Since r' is data-polymorphic in C , we know that, 
according to analysis result C , the function may return mutable data structures with 
polymorphic contents, and we declare it as a data-polymorphic function. 
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Definition 43 (Data-Adaptive CPA): For program e, Data- Adaptive CPA is an instan- 
tiation of the framework with Poly = DCPA, where 

DCPA((V 1. 1 — >■ r \ C) <; ti — >■ t 2 ,rv) = 6>, where for each a € t, 

f a' where a' is a fresh identifier, if erase (V ~t. t ^ t \ C) is type for 
a data-polymorphic function in Analysis 
a™ otherwise 

The second pass of Data-Adaptive CPA differs from CPA only when the function is 
detected as data polymorphic in the closure obtained by the first CPA pass. In this case, a 
new contour is always generated for every application. We now illustrate Data-Adaptive 
CPA on program £ 2 - After the first CPA pass, we have 

int <: u', (V {y}.y -s- j/\{}) <: u') G Analysis cpa(E 2 ) 

Thus u' is data-polymorphic, and so is ref u'. Since Az.new is inferred with type 
(V {z,u}.z — ?► ref ■u\{}) and erase{ref u') = ref u, Az.new is a data-polymorphic 
function. In the second pass, the two applications of Az.new have separate contours, and 
the program type-checks. 

We briefly sketch how Data-Adaptive CPA could be applied to data polymorphism 
in object-oriented programming. We illustrate the ideas by assuming an encoding of 
instance variables as cells, objects as records (which we expect can be added to our 
language without great difficulty), classes as class functions, and object creation as 
application of class functions. An example of such an encoding is presented in [7]. 
Consider applying such an encoding to the Java program fragment of Figure 3: The 



class Box f 

public Object content; 
public void set (Object obj) { 
content=obj ; 

} 

public Object get() { 
return content; 

} 

> 

Box boxl=new Box() ; boxl.set(new Integer(O)); 

Box box2=new Box() ; box2.set(new Boolean(true) ) ; 
. . . boxl . get 0 . . . 



Fig. 3. Java program exhibiting the need for data polymorphism 



two new Box 0 expressions would be encoded as two applications of the class function 
for class Box. When CPA is applied, since the two applications always apply to arguments 
of same type in any object encoding, the two applications share a single contour. Thus 
the two Box instances share a same object type, and the analysis would imprecisely 
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conclude that the result of boxl.getO includes object type Boolean. When Data- 
Adaptive CPA is applied, from the closure of the first CPA pass, the instance variable 
content would be detected as being associated with a data-polymorphic cell type. Since 
the class function for class Box returns a object value with content as a component, 
the class function would be detected as a data-polymorphic function. During the second 
pass, the two applications of the class function would have separate contours, thus the 
two instances of Box would have separate types and the imprecision would be avoided. 

For programs with much data polymorphism. Data -Adaptive CPA may become im- 
practical as many functions are detected as data-polymorphic. Similar to Agesen’s CPA 
implementation [2], a practical implementation should restrict the number of contours 
generated. 

Plevyak and Chien’s iterative flow analysis (IFA) [14] uses an iterative approach for 
precise analysis of data polymorphic programs. The first pass analyzes the program by 
letting objects of the same class share the same object contour. Every pass detects a set of 
confluence points (imprecise flow graph nodes where different data values merge) based 
on the result of the previous pass, and generates more contours with aim to resolve the 
imprecision at confluence points. The iteration continues until a fixed-point is reached. 
The advantage of IFA is that splitting is performed only when it is profitable, yet every 
pass is a whole-program analysis and the number of passes needed could be large. Use 
of declarative parametric polymorphism [5] to guide the analysis of data polymorphism 
could be a completely different approach that also could be considered. 



5 Terminating CPA Analyses 

Any instantiation of our poly variant framework terminates when only finitely many di- 
stinct contours are generated. The nCFA algorithms we defined terminate for arbitrary 
programs since the number of call-strings of length no more than n is finite. Unfortun- 
ately, the Idealized CPA and Dala-Adaptive CPA algorithms fail to terminate for some 
programs. 

Agesen [2] develops various methods to detect recursion and avoid the generation 
of infinitely many contours over recursive functions in his CPA implementation. One 
approach is to construct a call-graph during analysis, and restrict the number of contours 
generated along a cycle in the call-graph. However, for Idealized CPA, adding call-graph 
cycle detection is not enough to ensure termination. Consider the program 

Es = (Ac. c : = Xx.x-, (Ad. c : = {\y. d y)) ! c) new 

Its call-graph has only one edge: function (Ac . . . ) calls (Ad . . . ). There is no cycle in it. 
Consider running Idealized CPA on the program. For each value type lower bound of of 
u (assume the cell has type ref u), there is a contour generated for function (Ad . . . ). At 
first the type for /o = (Xx.x) becomes a lower bound of u, one contour is generated for 
function (Ad . . . ), and the type for closure /i = {Xy.fo v) becomes another lower bound 
ofu. So another contour is generated for (Ad. . . ), and the type for closure /2 = (Ay./i y) 
also becomes a lower bound of u. This process would repeat forever, with an infinite 
number of contours generated for function (Ad . . . ) . This example shows that call-graph 
based approach cannot ensure the termination of Idealized CPA. 
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Here we present a novel approach that ensures the termination of CPA for arbitrary 
programs. Our approach is based on the following observation: when Idealized CPA 
fails to terminate for a program, there must be a cyclic dependency relation among 
those functions having infinitely many contours. In the example, there exists such a 
cyclic relation: function (Ay . . . ) is lexically enclosed by (Ad ... ), and (Ad . . . ) applies 
to closure values corresponding to (Ay . . . ). If we detect such cycles and restrict the 
number of contours generated for functions appearing in cycles, non-termination could 
be avoided. To be precise, the key of our method is to construct a relation among value 
types during closure computation. This relation is defined as: 

Definition 51 (Flow Dependency, =^>): For constraint set C, define as a relation 
among value types such that if either 

TVi <: t ^ t' , TV 2 <: t G C, tvi = (V Ii. ti — >■ ti \ Ci) 

or 

TVi occurs as a subterm of (V t2. ^2 — >■ T2 \ C2), m2 = (V t2. t2 — >■ ^2 \ C2), mi ^ m2, 

and there exists at GI2 such that t appears in mi 

holds, then erase{rvi) erase (7^2) in C. 

The first case above defines a dependency when closure type tvi applies to value 
type tv2- The second case defines a dependency when closure type tv2 contains value 
type TVi as a subterm, so that when a new contour is generated for closure type tu2, a 
new value type is created which is tv\ with some of its free type variables renamed. If 
TVi tv2 in Cl, and Ci C C2, we have tui tv2 in C2. Thus relation ^ could be 
incrementally computed along with the incremental closure computation. We abbreviate 
TVi TU2 in C as tvi TV2 when C refers to the current closure under computation. 
We call TVi tv 2, - ■ ■ , TUn rui a cycle, and we write tui 4 > rUn if there exists a 
sequence tui tv 2, ■ ■ ■ , rvn-i rUn. 

Definition 52 (Terminating CPA): Terminating CPA is the instantiation of the frame- 
work obtained by defining Poly as: 

Poly((V 1. 1 — >■ r \ C) <: — >■ t2,m') — 6 >, where for each a Gt, 

j'^erase(Tu ) erase{m') ^ erase{(^ 1 , . t ^ T \ C)) 

0 (a) = < ; 

( Q™ otherwise 

The new algorithm differs from Idealized CPA in just one case: when a closure of 
type (V t. f — >■ T \ C) is applied to argument type tv' and we have erase{rv') 4 > 
erase ((V I. f — r \ C)) in the current closure, then by the definition of there would 
be a cycle: erase{rv') 4 > erase{{V t.t^T\ C)) erase{rv'). In this case, instead 
of renaming type variables in t as in Idealized CPA, they are renamed to a form only 
dependent on erase{rv') . In this way, even if (V < . f — >■ r \ C) applies to different types 
which are different instantiations of erase{rv'), there is only one contour generated for 
them. We will prove shortly that this will ensure termination of the closure computation. 

Applying the algorithm to example E^, suppose that, by the inference rule (Abs), 
function (Ad . . . ) has type and function (Ay . . . ) has type Ty. Since (Ad . . . ) lexically 
encloses (Ay . . . ), we have Ty ^ Td', and, since (Ad . . . ) applies to closures of (Ay . . . ), 
we also have Td Ty. Thus a cycle is detected, only two contours are generated for 
(Ad . . . ), and the algorithm terminates. 
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Theorem 53 (Termination): The Terminating CPA analysis terminates for arbitrary 
programs. 

Proof: Suppose not, i.e., for some program, its Terminating CPA closure C contains 
a V type tvq which has an infinite number of contours. Then, there must exist at least 
one TVi s.t. tvq takes as arguments an infinite number of instantiations of erase{TVi), 
and an infinite number of contours are generated for those applications. To have an 
infinite number of instantiations of erase{TV\), there must exist a V type tv2 s.t. tv2 
contains erase (ryi) as a sub-term, every new contour of tu2 causes the generation of a 
new instantiation of erose(n;i), and tu2 has an infinite number of contours. Repeating 
this process gives an infinite sequence erase{rvo), erase(ri;i), . . . erase{rvi) . . . where 
for each i, TV2*i has infinite number of contours when applying to instantiations of 
erase(ru2*i+i), and erase{rvi) ^ erase{rvi+i) . Since the program is finite, there are 
finitely many erase (tv) and there must be a cycle in the sequence. Thus, there exists j 
s.t. erase{TV2*j) erase{TV2*j+i) ^ erase{TV2*j) and 7 T; 2 *j has an infinite number 
of contours for applying to instantiations of erase{TV2*j+i). But, by the definition of 
Poly for Terminating CPA, this is impossible. □ 

A terminating Data-Adaptive CPA analysis can be similarly defined except that, 
besides cycles in the Flow Dependency relation, cycles in call-graph also need to be 
detected. 

6 Conclusions 

We have defined a polymorphic constrained type -based framework for polyvariant flow 
analysis. Some particular contributions include: showing how a type system with para- 
metric polymorphism may be used to model poly variance as well as data polymorphism; 
modeling nCFA and CPA in our framework; a refinement of CPA in the presence of data 
polymorphism; and, an approach to ensure the termination of CPA-style analyses. 
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Abstract. We compare the expressive power of exceptions and conti- 
nuations when added to a language with local state in the setting of 
operational semantics. Continuations are shown to be more expressive 
than exceptions because they can cause a function call to return more 
than once, whereas exceptions only allow discarding part of the calling 
context. 



1 Introduction 

Exceptions are part of nearly all modern programming languages, including ma- 
instream ones like Java and C-I-+. Continuations are present only in Scheme 
and the New Jersey dialect of ML, yet are much more intensely studied by theo- 
reticians and logicians. The relationship between exceptions and continuations 
is not as widely understood as one would hope, partly because continuations, 
though in some sense canonical, are more powerful than would at first appear, 
and because the control aspect of exceptions can be obscured by intricacies of 
typing and syntax. 

We have recently shown that exceptions and continuations, when added to 
a purely functional base language, cannot express each other [11]. That paper 
affords a comparison of, and contrast between, exceptions and continuations 
under controlled laboratory conditions, without any contamination from other 
effects so to speak. In this sequel paper we would like to complete the picture 
by comparing exceptions and continuations in the presence of state. It is known 
(and one could call it “folklore”) that in the presence of storable procedures, 
exceptions can be implemented by storing a current handler continuation. It is 
also plausible that the more advanced uses of continuations cannot be done with 
exceptions, even if state is available too. Hence we would expect a hierarchy 
rather than incomparability in the stateful setting. 

Formally, we compare expressiveness by means of contextual equivalence. For 
instance, we showed that {Xx.pxx)M ~ pMM is a contextual equivalence in a 
language with exceptions, whereas continuations can break it, so that exceptions 
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cannot macro-express continuations. Apart from the formal result, we would 
like to see the equivalences in the stateless setting of [11] as formalizing, at 
least to some extent, the distinction between the dynamic (exceptions) and the 
static (continuations) forms of control. The equivalences here give a different 
perspective, namely of how both forms of control alter the meaning of procedure 
call. With exceptions, a procedure call may discard part of its calling context; 
with continuations, a procedure call may return any number of times. It could 
be said that this distinction reflects the way that control manipulates the call 
stack: exceptions may erase portions of the stack; continuations may in addition 
copy them. However, we can make this distinction using only fairly high-level 
definitions of languages with exceptions and continuations, and a comparison of 
expressiveness. (Though ideally one would hope for a precise connection between 
the equivalences that hold for the various forms of control and the demands they 
put on storage allocation.) 

The notion of expressiveness used here was already mentioned by Landin [6, 
7], and formalized by Felleisen [3]. The reader should be warned that this notion 
of expressiveness is very different from the one used by Lillibridge [8] . Lillibridge 
was concerned with the typing of exceptions in ML, whereas we are concerned 
only with the actual jumping, that is, raising and handling exceptions, and 
invoking continuations, respectively. The typing of exceptions in ML “is totally 
independent of the facility which allows us to raise and handle these wrapped-up 
objects or packets” [1]. While the language for exceptions used here most closely 
resembles ML, we do not rely on typing, so that everything is also applicable 
to the catch/throw construct in LISP [14, 13], as it is essentially a spartan 
exception mechanism without handlers. 

The remainder of the paper is organized as follows. The main constructs and 
their operational semantics are defined in Section 2. We first answer a question 
from [11], by showing that local exceptions are more powerful than global ones 
in Section 3. The main result of the paper is that continuations in the presence of 
state are more powerful than exceptions, which is proved in Section 4. Section 5 
sketches how the result here could fit into a more systematic comparison between 
exceptions and continuations based on how often the current continuation can 
be used. Section 6 concludes. 

2 The Languages and Their Operational Semantics 

We extend the language used in the companion paper [11] with state by adopting 
the “state convention” from the Definition of Standard ML [9]. To avoid clutter, 
the store is elided in the rules unless specified otherwise. Formally a rule 

Ml ^ Pi ... M„ ^ P„ 

M1J.P 

is taken to be shorthand for a rule in which the state changes are propagated: 

SQ k Ml U- Pi, Si ... S„-1 F Mn U- Pn, 

So F M IJ. P, s„ 
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Table 1. Natural semantics of the functional subset 



Pij.{Xx.Pi) Qi^V Pi[x:= 


E] JJ. P iV JJ. n 


{PQ)ii-R 

iVJJ.0 


(succ N) JJ. (n + 1) 
AiJJ.(n + l) 


(pred N) JJ. 0 
iV JJ. 0 Pi JJ. i? 


(pred N) JJ. n 
AT JJ. (n + 1) P 2 JJ. P 


(ifO N then Pi else P 2 ) Jj- R (ifO N then Pi else P 2 ) JJ- R 


Vil-V 


(rec f{x). P) JJ. {Xx. P[f := (rec f{x). P)]) 


Table 2. Natural semantics of exceptions 


Ni).e Pii-V 


AfJJ.e PJJ.E' QJJ (raise e' E") e ^ e' 


(raise N P) i}. (raise e V) 
N ii-V Pii-V g JJ. E" 


(handle N P Q) ii- (raise e' V") 

Nii-e Pii-V g JJ. (raise e E') (E E') JJ. P 


(handle N P Q) i}. V" 
N JJ. (raise e V) 


(handle N P Q) ii- R 
N JJ. (raise e E) 


{op N) JJ. (raise e V) 
P JJ. (raise e V) 


(ifO N then Pi else P 2 ) JJ. (raise e E) 
P JJ. E g JJ. (raise e V) 


(P Q) JJ. (raise e V) 
N JJ. (raise e V) 


(P g) JJ. (raise e E') 
iV JJ. E P JJ (raise e V) 


(raise N P) i}. (raise e V) 
N JJ. (raise e' V) 


(raise N P) ii (raise e V') 
iV JJ. E P JJ. (raise e' V) 


(hcindle N P Q) ii (raise e' V) 


(hcindle N P Q) ii (raise e' V') 


Table 3. Natural semantics of state 


s h M JJ. a, Si 


s h M JJ. (raise e E), si 


s h (!M) JJ. si(a),si 
s h M JJ. E, Si a ^ dom(si) 


s h ( ! M) JJ. (raise e E), si 
s h M JJ. (raise e E), si 


s h (ref M) JJ. a, Si + {a !->■ V} 
s h M JJ. a, Si Si h N JJ. E, S 2 


s h (ref M) JJ. (raise e E), Si 


s h {M:=N) JJ. V,S 2 + {a^V} 




s h M JJ. (raise e E), si 


shMJJ.E',si Si h iVJJ. (raise e E),S 2 


s h (M :=N) JJ. (raise e E), si 


s h (M : =N) JJ. (raise e E), S 2 
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Table 4. Evaluation-context semantics of continuations and state 



V ::= X \ n \ a \ Xx.M \ rec f{x). M \ 

E ::= [•] I [E M) \ {V E) \ (succ E) \ (pred E) \ (ifO E then M else M) 
I (callcc E) I (throw E M) \ (throw V E) 

I {xefE)\{\ E)\{E-.=M)\{V-.=E) 



s,E[{\x.P)V] 

s, E[succ n] — > 

s, if [pred 0] —>■ 

s, if [pred (n -I- 1)] — > 

s, if [if 0 0 then M else A^] — > 

s, if [if 0 (n -|- 1) then M else A^j —>■ 
s, E[rec f{x). M] — > 

s, if [callcc (Ax. P)[ ^ 

s, if [throw ii^E') V] — > 

s, if [ref V] — > 

s, if [ ! n] — y 

s,E[a:=V] 



s,E[P[x := E]j 
s, if[n -I- 1] 
s, if[0] 
s, E[n] 
s,E[M] 
s,E[N] 

s,E[M[f := (Ax. (rec /(x). M) x)]] 

s,E[P[x := (#P)]] 

s,E'[V] 

s -I- {a I— >■ V}, if[a] where a ^ dom(s) 

s,E[s{a)] 

s + {a^V},E[V] 



This version of exceptions (based on the “simple exceptions” of Gunter, Remy 
and Riecke [5]) differs from those in ML in that exceptions are not constructors. 
The fact that exceptions in ML are constructors is relevant chiefly if one does 
not want to raise them, using exn only as a universal type. For our purposes, 
there is no real difference, up to an occasional ry-expansion. 

Definition 1. We define the following languages: 

— Let Ay-|- be defined by the operational semantics rules in Table 1. 

— Let AyH-exn be defined by the operational semantics rules in Tables 1 and 

2 . 

— Let Ay-l-state be defined by the rules in Table 3 and those in Table 1 subject 
to the state convention. 

— Let Ay-fexn-l-state be defined by the rules in Table 3, as well as those in 
Tables 1 and 2 subject to the state convention. 

The rules for state are based on those in the Definition of Standard ML [9] (rules 
(99) and (100) on page 42), except that ref, ! and : = are treated as special forms, 
rather than identifiers. A state is a partial function from addresses to values. For 
a term M, let Addr(M) be the set of addresses occurring in M. A program is a 
closed term P not containing any addresses, that is Addr(P) = 0. 

We also need a language with continuations and state: 

Definition 2. Let Ay-l-cont-l-state be the defined by the operational semantics 
in Table j. 
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The small-step operational semantics of Ay-l-cont 4-state with evaluation con- 
texts is in the style of Felleisen [12], with store added. Both addresses a and 
reified continuations are run-time entities that cannot appear in source pro- 
grams. 

Let a context C be a term with a hole not containing addresses. 

Definition 3. Two terms P and P' are contextually equivalent, P ~ P' , iff for 
all contexts C, we have 0 h C[P] IJ. n, s for some integer n, iff%V- C[P'] IJ. n, s'. 

Contextual equivalence is defined analogously for the small-step semantics. Ho- 
wever, in the small-step semantics we will be concerned with breaking equiva- 
lences, a strong version of which is the following: 

Definition 4. Two terms P and P' can be separated iff there is a context C 
such that: 0, C[P] — >■* s, n for some integer n, and 0, C[P'] — >■* s', n! with n ^ n' . 

(Again, the definition for big-step is analogous.) 

Local definitions and sequencing are the usual syntactic sugar: 

(let X = M in N) = {Xx.N) M 

(M; N) = {Xx.N) M where x is not free in N 



3 Local Exceptions Are More Powerful than Global Ones 

In this section, we show that even a small amount of state affects our comparison 
of continuations and exceptions. It may be surprising that local (that is, under 
a A) declarations should have state in them, but local exception declarations 
generate new exception names (somewhat like gensym in LISP), and the equality 
test implicit in the exception handler is enough to make this observable. 

Proposition 1. There are terms that are contextually equivalent in the language 
with global exceptions Ay-fexn, but which can be separated if local exceptions are 
added. 

Proof. In AyH-exn, we have a contextual equivalence 
(Xx.pxx) M ~ pMM 

The proof of [11, Proposition 1] generalizes to the untyped setting. But local 
exceptions can break this equivalence: see Figure 1 for a separating context. □ 

From our perspective, we would maintain that the equivalence holds for the pure 
control aspect of exceptions, and is broken only because local exceptions are a 
somewhat hybrid notion with state in them. 

Since all we need from local exceptions here is that one term evaluates to 
1 and another to 2, we do not give a formal semantics for them, referring the 
reader to the Definition of Standard ML [9] (for a notation closer to the one used 
here, see also [5]). 
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fun single m p = let val y=mOinpyy end; 
fun double m p = p (m 0) (m 0) ; 



fun localnewexn d = 
let 



in 

end; 



exception e 

fun r d = raise e 

fun h f X = ((f 0) handle e => x) 

fn q => q r h 



fun separate copier = 

(copier localnewexn) 

(fn ql => fn q2 => 

ql (fn rl => fn hi => 
q2 (fn r2 => fn h2 => 

hi (fn d => h2 rl 1) 2))) ; 

separate single; 
val it — 1 : int 
separate double; 
val it = 2 : int 



Fig. 1. A separating context using local exceptions in Standard ML 



The point in separating (Figure 1) is that each call of localnewexn gene- 
rates a new exception. The handler in h2 can only handle the exception raised 
from rl if h2 and rl come from the same call of localnewexn, as they do in 
separate single, but not in separate double. 

Local exceptions are relevant for us for two reasons: first, they make the equi- 
valence for exceptions used in [11] inapplicable; second, they can to some extent 
approximate downward continuations. The example in Figure 1 does perhaps not 
witness expressive power in any intuitive sense. A more practical example may be 
the following: can one define a function f that passes to some unknown function 
g a function h that when called jumps back into f (assuming h is called before 
the call of f has terminated, because otherwise this would be beyond excepti- 
ons). With downward continuations, one can easily do that: in Ay-l-cont 4-state, 
we would write f as A( 7 .callcc(Afc .5 (Ax.throwfcx)). Even such pedestrian con- 
trol constructs as goto in ALGOL and longjmpO in C could do this. Yet with 
the simple version of exceptions we have in Ay-|-exn, a handler in g may catch 
whatever exception h wanted to use to jump into f . With local exceptions howe- 
ver, f could declare a local exception for h to raise, which would thus be distinct 
from any that g could handle. On the other hand, language designers specifically 
chose to equip g so that it can intercept jumps from h to f: in ML even local 
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exceptions can be handled by using a variable (or just a wildcard) pattern in the 
handler, while LISP provides unwind-protect. 

4 Exceptions Cannot Make Functions Return Twice 

Encodings of exceptions in terms of stored continuations have been known for 
some time, and can probably be regarded as folklore [5]; see also Reynolds’s 
textbook [10]. It would still be worthwhile to analyze encodings of the various 
notions of exceptions in more detail. But the fact that such an encoding is pos- 
sible, and that consequently continuations and state are at least as expressive 
as exceptions and state, will be treated as a known result here. We will strengt- 
hen it by showing that continuations in the presence of state are strictly more 
expressive than exceptions. 

Define terms Ri and i ?2 in Ay-|-state by 

Rj = Xz.{{Xx.Xy.{zO; x:= \y; y:=j; ! x)) (ref 0) (ref 0)) 

Informally, the idea is that j is hidden inside Rj. As the variables x and y are 
local, the only way to observe j would be to run the assignments after the call 
to z twice, so that j is first moved into y, and then x, whose value is returned 
at the end. With exceptions, that is impossible. 

The proof uses a variant of the technique used for exceptions in [11], exten- 
ded to deal with the store. First we define a relation needed for the induction 
hypothesis: 

Definition 5. We define relations ~ and ^a, where A is a set of addresses, as 
follows: 

— on terms, let ~ be the least congruence such that M ^ M and Rj ~ Rj> for 
any integers j and f ; 

— for stores, let s s' iffAC dom(s) = dom(s') and for all a € A, s(a) ~ 
s'(a) and Addr(s(a)) C A; 

— for stores together with terms, let s,M ^a s' ,M' iffs s' and M ~ M' , 
and also Addr(M) C A. 

Intuitively, s,M s',M' implies that M in store s and M' in store s' are 
linked in lockstep; but the stores may differ in addresses outside A, which are 
inaccessible from M . 

Lemma 1. Assume s,P ^a s',P' and s h P Q,si. Then there exist a term 
Q' , a store and a set of addresses A\ such that 

— s' h P' ^ Q', s'l; 

si,Q Q , 

— Ac Ai and (dom(s) \ A) C (dom(si) \ Ai); 

— for all addresses a € dom(s) \ A, the stores satisfy si(a) = s(a) and s((a) = 
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Proof. Proof by induction on the derivation of s h P Q, si. We assume s, P 
s' , P' and proceed by cases on the last rule applied in the derivation. 

Case P = MN and s h MN Q, S4. The last rule is 

s \- M i}. Xz.Mi, Si Si\- N ij. V2,S2 S2 Mi[z := V2] ij- Q, S4 

s h M TV U- Q, S4 

As M N = P ^ P' , P' must be of the form M'N'. By the induction hypo- 
thesis applied to s h M IJ. {Xz.Mi),si, we have s' h M' IJ. with 

s\,Xz.Mi s'i,Xz.M'i. 

There are two possible cases implied by Xz.Mi ~ Xz.M'i'. either Mi ~ 
or Xz.Mi = Rj and Xz.M'i = Rjr. In the first case, the claim follows by 
repeatedly applying the induction hypothesis. So suppose the second, that 
Xz.Mi = Rj and Xz.M'i = Rj>. We apply the induction hypothesis, giving 
us s'i,N' IJ. Vf, s'2 with S2, V2 ^A2 s'2, Vf. Now 

Mi[z -.= ¥2] = {Xx.Xy.iV2O-, x:= \y, y:=j; ! a;)) (ref 0) (ref 0) 

This term will allocate two new addresses, so let a,b ^ dom(s2). Then S2 b 
Mi[z := V2] -U- Q,S4 iff 

S2 -I- {a !->■ 0 , 6 !->■ 0 } h V2 0 ; a:= ! &; 6:=j; ! a Q, S4 

There are two possible cases, depending on whether V2 0 in store S2 -I- {a 1— >■ 
0 , 6 !->■ 0 } raises an exception or not. First, suppose it does, that is, 

S2 -I- {a I— >■ 0 , 6 I— >■ 0 } h V2 0 ij. raise e V3, S3 ( 1 ) 

As S2 -I- {a !->■ 0 , & 0 }, V2 0 ~a 2 S2 -I- {a !->■ 0 , 6 !->■ 0 }, Vf 0 , the induction 

hypothesis implies 

S2 -I- {a I— >■ 0 , 6 I— >■ 0 } h P2 0 ij. raise e Vf, S3 

with raise e V3, S3 ^a2 raise e Vf, S3. The exception propagates, devouring 
the difference between j and f in this call of Rj, more technically: 

S2 -I- {a I— >■ 0 , & I— >■ 0 } h V2 0 fj. raise e V3, S3 

S2 -I- {a I— >■ 0 , & I— >■ 0 } h V2 0 ; a: = ! & fj. raise e V3, S3 

S 2 -I- {a I— >■ 0 , & I— >■ 0 } h C2 0 ; a: = ! &; 6: =j fj. raise e V3, S 3 

S2 -I- {a I— >■ 0 , & I— >■ 0 } h V2 0 ; a: = ! &; 6: =j; ! a fj. raise e V3, S3 

That is, S2 -I- {a !->■ 0, 5 !->■ 0 } h Mi [z := V2] -U- raise e V3, S3, hence the whole 
call raises an exception 

S2 -I- {a I— >■ 0 , 6 I— >■ 0 } h MN fj. raise e V^, S3 ( 2 ) 

Analogously for Vf. Letting Q — raise e V3 and S4 = S3, we are done for 

this subcase. Now assume V2 0 does not raise an exception, so that there is 

a value V3 returned by the call: 

S2 -f {a I— >-0}h V2OIJ. V3, S3 



( 3 ) 
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We apply the induction hypothesis to the call V2 0, relying on the fact that V2 
can only reach addresses in A2, so that it cannot modify the newly allocated 
a and b: 

S 2 + {a e- >■ 0, 6 e- >■ 0}, V 2 0 S 2 + {a e- >■ 0, & e- >■ 0}, V 2 0 

The induction hypothesis thus gives us S2 + 0; ^ 0} ^2 0 'll ^3^ S3) 

and S3, V3 S3, V3'. As 6 G dom(s2 + {a i->- 0, 6 i->- 0}), but b ^ A2, we have 
ssib) = 0, and s'^{b) = 0, and also b ^ A3. Putting the pieces together, we 
derive: 

S2 + {a i-G 0, 6 i-G 0} h (V2 0; a:= \b; b:=j; ! a) 1| 0, S3 + {6 i-G j} 

hence 

S2 + {a i-G 0, 6 i-G 0 }, (Ax.Aj/.(V2 0; x:= \y; y:=j; ! a;)) (ref 0) (ref 0) 

1| 0, S3 + {6 I— >■ j} 



Analogously 

s'2 + {a 0,6 H> 0} h (P2'0; a:= \b; b:=f; ! a) 1| 0, s'3 + {6 /} 

hence 

s '2 + {a 0,6 H, 0}, (Aa:.Ai/.(P 2 ' 0 ; a;:= \y; y:=f; ! x)) (ref 0) (ref 0) 

1| 0, S3 + {6 I— >■ j'} 

Thus 



s h MA^ 1 | 0 , S3 + {6 H> j} ( 4 ) 

and s' h M'N' U- 0 , S3 + {6 H> j'} with S3 + {6 i-G j}, 0 Sg + {6 i-G j'}, 0 , as 
required. This is the linchpin of the whole proof: 6 holds j or j', respectively; 
but that is of no consequence, because 6, lying outside of A3, is garbage. 

Case P = \M and sh !MlJ.si(a),si. Hence s h M IJ. a, si . As \M ^ P', P' 
must be of the form !M' with M ~ M' . By the induction hypothesis, s' h 
M' IJ. Q',s[ with si,a s[,Q', and Addr(s(a)) C Ai. As this implies 
a ^ Q', we have a = Q', so that s' h M' IJ. a,s[, which implies s' h \M' JJ. 
s((a), S3. As o = Addr(Q') C Ai, si(a) ~ s'i(a)- Thus si, si(a) Si, Si(a), 
as required. 

Case P = ref M and s h ref M JJ. a, si + {a 1— >■ V}. Hence s h M JJ. V, si with 
a ^ dom(si). As ref M ~ P', P' must be of the form ref M' with M ~ M' . 
By the induction hypothesis, s' h M' JJ. with Si,P where 

A C Ai. Thus s' h ref M' JJ. a, s( + {a i-G V'}. (We can pick the same a, 
because a ^ dom(s3) = dom(si).) Thus, s' h ref M' JJ. a, s'3 + {a i-G V'} with 

Si + {a I— >■ P}, a ~AiU{a} si + {a 1— >■ V'}, a 

Furthermore, A C AiU{a} and dom(s)\A C dom(si + {a i-G P})\(Ai U {a}). 
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Case P = {M : =N) and s \~ M : =N ij. V, S2 + {a V}. Hence s \~ M ]}. a,si and 

51 h iV -II y, S2- As M:=N ~ P', P' must be of the form M':=N', with 
M ~ M' and N ^ N' . Applying the induction hypothesis toshMJ|a,Si 
gives us Q' and s'^ such that s' h M' 1 | si and si, a si, So a ~ Q', 
which means Q' = a. Applying the induction hypothesis to si h J| V, S2 
and si,A^ si,A^' gives us V and s^ such that S2,V si,C'. Thus 
s' h M' : =N' II y', s'2 + {a V'} with 

S2 + {a ^}, ^ '^Az S2 T {a 

as required. Assume b is an address with b G dom(s) \ A. Then b ^ A2, and 
S2{b) = Si ( 6 ) = s{b). Because a G A2, we have b ^ a, so that the store 

52 + {a i-G y} still maps b to s{b). 

Otherwise. The last rule in the derivation must be of the form 

S b Pi 1 | Ql, Si ... Sn—l b Pn 'll Qn^ 

S b P 1 | ( 5 , S„ 

Observe that the Pi in the antecedents are the immediate subterms of the P 
in the conclusion, and that conversely the Q in the conclusion is assembled 
from subterms of P and some of the Qi in the antecedents. Hence: 

Addr(Pi) U . . . U Addr(P„) C Addr(P) 

Addr(Q) C Addr(P) U Addr(Qi) U . . . U Addr(Q„) 

Because s,P s',P', we have s ~ s' and P ~ P' . The case P = Rj is 
trivial; otherwise we have P ^ P' due to congruence, so there are P{, . . . , P^ 
with Pi ~ P'. Now s. Pi s',P{ (because Addr(Pi) C Addr(P) C A). By 
the induction hypothesis, there exist Q'l, s'l and Ai such that si,Qi 
s'l, Q'l and A C Ai. Hence si, P2 Si, T2, so that we can apply the induc- 
tion hypothesis again to Si b P2 1 | Q2, S2, and so on for all the antecedents. 
Finally, let Q' be built up from the Qi in the same way as Q is built up from 
the Qi- By congruence, we have Q ~ Q'. As Sn ~ s'„ and Addr(Q) C A„, we 
have Sn, Q as required. □ 

We have thus shown that terms containing Ri and R2, respectively, proceed in 
lockstep. This implies that the Rj are contextually equivalent: 

Lemma 2. Pi and P 2 are contextually equivalent in Ay-|-exn-|-state. 

Proof. Let C be a context. Suppose 0 b C'[Pi] 1 | n, s for some integer n. We 
need to show that C[P2] also reduces to n. First, note that because ~ on terms 
is defined to be a congruence with Pi ~ P2, we have C[Pi] ~ C[P2]. As neither 
of these terms contains any addresses, they are related in the empty store with 
respect to the empty set of addresses, that is 0 , C[Pi] 0 , C[P2]. By Lemma 1 , 
we have 0 b C[P2] H Q', s', for some s', Q' and A such that s, n s', Q'. This 
implies n ~ Q', so that n = Q' . The argument for showing that 0 b C[P2] 1 | n, s 
implies that C[Pi] in the empty store also reduces to n is symmetric. □ 




On Exceptions Versns Continuations in the Presence of State 



407 



fun R j z = (fn x => fn y => (z 0; x := !y; y := j; !x))(ref 0) (ref 0) ; 

fun C Rj = 
callcc(fn top => 
let 

val c = ref 0 
val s = ref top 

val d = Rj (fn p => callcc(fn r => (s := r; 0))) 
in 

(c := ! c + 1 ; 

if !c = 2 then d else throw (!s) 0) 

end) ; 



C(R 1) ; 
val it = 1 : int 
C(R 2) ; 
val it = 2 : int 



Fig. 2. A separating context using continuations and state in SML/NJ 



Note that the proof would still go through if we changed the notion of observation 
to termination, or if we restricted to the typed subset. 

It remains to show that the two terms that are indistinguishable with ex- 
ceptions and state can be separated with continuations and state. To separate, 
the argument to Rj should save its continuation, then restart that continuation 
once, so the assignments get evaluated twice, thereby assigning j to x, and thus 
making the concealed j visible to the context. 

Lemma 3. In Ay-l-cont-l-state, R\ and i ?2 can he separated: there is a context 
C[-] such that 



0,C[i?i] si,l 

0,C[i?2] s'i,2 

This is actually strictly stronger than Ri and i ?2 not being contextually equiva- 
lent (and it is machine-checkable by evaluation). We omit the lengthy calculation 
here, but see Figure 2 for the separating context written in Standard ML of New 
Jersey. From Lemmas 2 and 3, we conclude our main result: 

Proposition 2. There are Ay-l-state terms that are contextually equivalent in 
Ay -|-exn-f state, but which can be separated in AyH-cont-fstate. 

Combined with the known encodings of exceptions in terms of continuations and 
state, Proposition 2 means that continuations in the presence of state are strictly 
more expressive than exceptions. 
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5 Exceptions Can Discard the Calling Context 

We have established that continuations are more expressive than exceptions by 
showing how they affect functions calls: using continuations, a call can return 
more than once. In this section, we aim at an analogous result for showing how 
exceptions give rise to added power compared to a language without control: 
using exceptions, a function call may discard part of the calling context. To put 
it facetiously as a contest between a term and its context, in the previous section 
we concocted a calling context whose main ingredient 

■ . ■zQ\x: = \y,y.=j-,\x . . . 

was chosen such that something good (for separating) would happen if only the 
callee z could return twice. Now we need a calling context in which something bad 
happens if the callee returns at all. One such context is given by sequencing with 
divergence. (The callee could avoid ever returning to the divergence by diverging 
itself, but for separating that would defeat the purpose.) More formally, there are 
terms that are contextually equivalent in the language with state but no control, 
and which can be separated in the language with exceptions and state (in fact, 
in any language with control). Let O be the diverging term ((rec f{x). f x) 0). 
The recursion construct is used here so that everything generalizes to the typed 
subset of Ay+state; if we are only concerned with the untyped language, we 
could just as well put f2 = (Ax.xx)(Ax.xx). Analogously to Lemma 2, we have 

Lemma 4. (M; 17) and (iV; 17) are contextually equivalent in Ay+state. 

Proof. (Sketch) Let ~ be the least congruence such that M ^ M and M; 17 ~ 
N] 17 for any M and N. Let ~ be defined on states pointwise, and let s, P ~ s', P' 
iff s ~ s' and P ~ P'. As in Lemma 1, we need to show that if s, P ~ s', P' 
and s h P H (5, si, there is a Q such that s' h P' JJ. Q' , s'^ with si, Q ~ s[,Q'. 
The only non-trivial case if P = (M; 17) and P' = {N; 17). Suppose one of them 
reduces to integer. If we do not have control constructs, that can only be the case 
if 17 reduces to a value. But here is no V such that s, 17 IJ, M, si. (For suppose 
they were: there would be a derivation of minimal height, which would have to 
contain a smaller one.) □ 

The proof is simpler than for exceptions because when we relate two terms 
(M; 17) and (IV; 17) it does not matter what M and N do, or what storage they 
allocate, as the 17 prevents any observation. 

Lemma 5. (M; 17) and {N; 17) can be separated in Ay+exn+state. 

Proof Let 

M = raise e 1 
N = raise e 2 
C = handle e [•] (Ax.x) 

Then we have 0 h C[M; 17] IJ. 1, 0 and 0 F C[N; 17] IJ. 2, 0 in Ay+exn+state. □ 
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Proposition 3. There are two terms in Ay+ that are contextually equivalent 
in Ay+state, but which can he separated in Ay+exn+state. 

So far we have used operational semantics and contextual equivalence as 
a kind of probe to observe what control constructs can and cannot do. The 
astute reader may however have begun to suspect what the preoccupation with 
discarding the current continuation, or using it more than once, is driving at. 
In the remainder of this section, we sketch how the earlier material fits in with 
linearity in the setting of continuation semantics. 

It is evident that in the continuation semantics of a language without control 
operators the current continuation is used in a linear way. For the function type 
we have 

Bj = (|B] ^ Ans) (|A] ^ Ans) 

In a language with callcc, the — ° would have to be replaced by a — >■, because 
the current continuation could be discarded or copied. Domain-theoretically, the 
linear arrow — ° can be interpreted as strict function space. So in the case of 
M; 17, the meaning of a looping term is |17] =_L, and because 

|M] : Env (|i?] — > Ans) ^ Ans 

is strict in its continuation argument, it preserves _L. So it is immediate that 
|M;f21 = T= |iV;f21 

Moreover, this argument is robust in the sense that it works the same in the 
presence of state. In the semantics of a language with state, expression conti- 
nuations take the store as an additional argument, so that the meaning of M is 
now: 



|M] : Env — ^ Store — (|i?] — >■ Store — ^ Ans) ^ Ans 

This is still strict in its continuation argument, mapping the divergent continua- 
tion T to T. 

All this requires little more than linear typechecking of the CPS transform. 
What seems encouraging, however, is that exceptions begin to fit into the same 
framework. For a language with exceptions or dynamic catch, the continuation 
semantics passes a current handler continuation. Here the current continuation 
and the handler continuation together are subject to linearity (this linearity is 
joint work in progress with Peter O’Hearn and Uday Reddy, which may appear 
elsewhere). Assuming that all exceptions are injected into some type E (like exn 
in ML), the linearity is seen most clearly in the function type: 

|A — >• H] = ((|H] ^ Ans)&(|if] ^ Ans)) (|A] -> Ans) 

(Note that this linear use of non-linear continuations is quite different from 
“linear continuations” [4]). Again the linearity would remain the same if state 
were added to the continuations. The current continuation can be discarded in 
favour of the handler, but never used twice. Exceptions thus occupy a middle 
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ground between no control operators {linear usage of the current continuation) 
and first-class continuations {intuitionistic, that is unrestricted, usage). For this 
reason we regard Lemma 2, which confirms that with exceptions no function call 
can return twice, as more than a random equivalence: it seems to point towards 
deeper structural properties of control made observable by the presence of state 
(in that the ref construct allowed us to “stamp” continuations uniquely, and 
then to count their usage with assignments). 

6 Conclusions and Directions for Further Research 

It is striking how sensitive the comparison between exceptions and continuations 
is to the chosen measure of expressiveness: in Lillibridge’s terms, “exceptions are 
strictly more powerful” than continuations [8]; in terms of contextual equivalence 
and in the absence of state they are incomparable [11]; while in the presence of 
state, continuations are strictly more expressive than exceptions. The last of 
these is perhaps the least surprising because closest to programming intuition. 

Each of these notions is to some extent brittle. For instance, comparisons of 
expressiveness based on the ability to encode recursion are inapplicable if the 
language under consideration already has recursion — and in the presence of state 
(including storable procedures) that is inevitable, as one can use Landin’s tech- 
nique of “tying a knot in the store” to define the “imperative Y-combinator” . 
On the other hand, the technique of witnessing expressive power by breaking 
equivalences, while more widely applicable, is not completely robust either, if 
other effects are added to the language that already break the equivalence: 
compare Proposition 1 and [11, Proposition 1]. (Equivalences may be broken 
for uninteresting as well as interesting reasons.) Furthermore, while we would 
claim that Proposition 2 confirms and backs up programming intuition, it can 
hardly be said to express the difference between exceptions and continuations. 
A type system for the restricted (linear or affine) use of the current continua- 
tion would come much closer to achieving this. Ideally, such a linear typing for 
continuation-passing style together with typed equivalences of the target langu- 
age should entail the equivalences considered here; we hope that our results will 
give such a unified treatment something to aim for. It has been suggested to us 
that “of course exceptions are weaker — they’re on the stack”. Some substance 
might conceivably be added to such statements if it could be shown that linearity 
in the use of continuations by dynamic control constructs is what allows control 
information to be stack-allocated (see also [2, 15]). 
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Equational Reasoning for Linking 
with First-Class Primitive Modules 



J. B. Wells and Rene Vestergaard^ 
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Abstract. Modules and linking are nsually formalized by encodings 
which use the A-calculus, records (possibly dependent), and possibly 
some construct for recursion. In contrast, we introduce the m-calculus, 
a calculus where the primitive constructs are modules, linking, and the 
selection and hiding of module components. The m-calculus supports 
smooth encodings of software structuring tools such as functions (A- 
calculus), records, objects (^-calculus), and mutually recursive definiti- 
ons. The m-calculus can also express widely varying kinds of module 
systems as used in languages like C, Haskell, and ML. We prove the m- 
calculus is confluent, thereby showing that equational reasoning via the 
m-calculus is sensible and well behaved. 



1 Introduction 

A long version of this paper [44] which contains full proofs, more details and ex- 
planations, and comparisons with more calculi (including the calculus of Ancona 
and Zucca [5]), is available at http://www.cee.hw.ac.uk/~jbw/papers/. 



1.1 Support for Modules in Established Languages 

All programming languages need support for modular programs. For languages 
like C, conventions outside the definition of the language provide this support. 
Each source file is compiled to an object (“-o”) file which plays the role of the 
module. The namespace of modules is simply the file system and linking of mo- 
dules is specified via extra-linguistic mechanisms such as makefiles. Connections 
are hard-wired to the component name rather than the module name: If module 
X uses module Y, modules Z and W supplying components with the same names 
as those of Y can be substituted for Y. There is a single global namespace for 
component names. Mutual dependencies between modules is possible, but there 
is no mechanism for black-box reuse of modules and no support for hierarchical 
structuring of modules within modules. 

Languages like Ada [10], Modula-3 [26], and Haskell [1] support a kind of 
module which we will call packages. With packages, there is a flat namespace of 
modules; by convention module names correspond to filenames. Connections are 
hard- wired to module names: If module X uses module Y, then any replacement 
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G. Smolka (Ed.): ESOP/ETAPS 2000, LNCS 1782, pp. 412-428, 2000. 
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for Y must also be named Y and support at least the components used by X. As 
with C, mutual dependencies are supported but black-box reuse and hierarchical 
structuring are not. 

The Standard ML language [36] has a very sophisticated module system 
which supports functions from modules to modules. There is again a namespace 
of modules, but modules can be nested hierarchically. Connections can be speci- 
fied by components of module X referring to a previously defined module Y by 
name. Connections can also be specified by defining a functor, a function from 
modules to modules: If module X depends on a module named Y, then a functor 
F can be defined whose meaning is the function (AY.X). The functor F can be 
applied to other modules to yield new concrete modules. This provides flexibi- 
lity in linking modules. Although ML supports black-box reuse and hierarchical 
structuring, mutually recursive modules are not allowed. (Current research is 
addressing this issue, e.g., [15].) 

1.2 Reasonable Goals for a Module Formalism 

The wide variety of existing module systems have evolved to satisfy a number of 
goals. We have designed a formal system, the m-calculus, for specifying and rea- 
soning about the behavior of such module systems. In designing the m-calculus, 
we believed that it should satisfy as many of the following goals as possible: 

Reuse without copying or modification: It should be possible (1) to use an 
individual module more than once in a program, (2) for each use of a module 
to be connected to other modules in different ways, and (3) for this to be done 
without changing or duplicating the source code of the module. This is called 
“black-box reuse” or extensibility [32] . Satisfying this requires that inter-module 
connections need not be specified inside the modules. We handle this in our 
m-calculus with incomplete (or abstract) modules and a linking operator. 

Modules within the language: It should be possible to represent modules and 
linking together with the features of a core language in a single formalism. Rea- 
soning about the behavior of real systems requires reasoning about all of the 
components of the real system simultaneously. Satisfying this goal requires eit- 
her (1) that the module formalism should be able to represent core language 
features or (2) that it should be possible to combine the module formalism with 
formal systems for core languages. For our m-calculus we prefer approach (1) 
although approach (2) should be possible for many core languages. 

First-class modules: It should be possible (1) for linking of modules to depend 
on arbitrary computations, (2) for modules to be created and loaded dynamically, 
(3) for modules to be passed as parameters and stored in data structures. This 
kind of power is necessary for reasoning about dynamic linking, a feature which 
is used in many C implementations on an ad hoc basis and is even appearing in 
language definitions such as that of Java [25]. Satisfying this requires either that 
the module formalism should support general computation or that it should be 
able to interact with the formalism used to represent the core language. 

Closer fit to real systems: The module formalism should closely fit the actual 
features of real systems. For example, this means that the coding of modules 
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and linking via A-calculus, records, and a fix-point operator is inappropriate 
and cumbersome for languages with package-based module systems. This also 
means that the module formalism should have direct support for features of 
existing module systems such as mutual dependencies between modules as well 
as hierarchical structuring of modules. Our m-calculus easily models all three 
styles of module system that were described above. (Note that we do not deal 
with type issues in this paper.) 

Sound and flexible equational reasoning: The module formalism should easily 
support (1) defining how a particular program will behave and (2) understan- 
ding the effect of program transformations. While many techniques have been 
developed for achieving (1), a particularly simple method is to define a reduction 
semantics, i.e., to define a set of evaluation contexts and a set of program-to- 
program rewrite rules. If this method is followed, (2) can be achieved by allowing 
the use of the rewrite rules in any context, not just in evaluation contexts, pro- 
vided the consistency of the rules can be established. For our m-calculus, we 
establish internal consistency of the rewrite rules by proving the system is con- 
fluent . 



1.3 A More General Notion of Module 

The key to achieving the above-mentioned goals in the m-calculus is the use of a 
more general notion of module together with a linking operation. An incomplete 
or abstract module (introduced as a mixin module or a mixin in [4], formalized 
in a calculus in [5], and related to the notions of mixin in [17, 18, 13, 12]) is a 
collection of components of which some are exported (externally visible), some 
are private, and some are declared but not defined. We call the latter deferred 
components. For example, consider the following incomplete modules Mi and 
M 2 , where N(f ,g,i) is an expression that depends on f, g, and i and similarly 
for 0(h) and P(f ,i): 



Ml = (module 

exported f = N(f,g,i) 
deferred g 
deferred h 
private i = 0(h)) 



M 2 = (module 

deferred f 
exported g = P(f,i) 
deferred h 
private i = Q) 



Although the module components are named, the modules themselves do not 
bear names, i.e., they are anonymous, like abstractions in the A-calculus [9]. In 
the m-calculus, we would write the above as: 



Ml = {f [> w = N(w, X, z), g[>x = *, h>y = *, _>z = 0(y)} 

M 2 = {f[>w = *, g>x = P(w,z), hc>y = *, _>z = Q} 

In the m-calculus, each component has separate external and internal names 
from different namespaces (like in [27]). The internal names are subject to a- 
conversion and are necessary to support correctness of substitution in the m- 
calculus. The private components have only an internal name; the label 
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means “no name”. Using standard m-calculus abbreviations, we can write the 
component (_t>z = 0(y)) as simply (z = 0(y)). The component body indi- 
cates a deferred component where the body needs to be filled in by linking. 

The meaning of deferred components is established by the linking operation. 
The result of the operation of linking Mi and M 2 , written Mi © M 2 , is the new 
module M 3 : 

M3 = (module 

exported f = N(f,g,i) 
exported g = P(f,i’) 
deferred h 
private i = 0(h) 
private i’ = Q) 

In linking, deferred components are concreted by exported components of the 
other module. The two modules must not export components with the same 
name. Private components get renamed as necessary to avoid conflicts. Mu- 
tually recursive intermodule dependencies are supported — the example f and 
g components above depend on each other. In the m-calculus, M 3 is: 

M 3 = {f > w = N(w,x, z), g ox = P(w,z'), h o y = •, _o z = 0(y), _o z' = Q} 

The internal name of a component whose name does not match a component 
in the other module can be a-converted to a fresh name to avoid conflicts. 
The example does not illustrate this, but internal names of components with 
matching external names are a-converted to be the same to enable linking. In 
the m-calculus, M 3 being the result of Mi © M 2 is expressed by the single rewrite 
step Ml © M 2 — 1 - M 3 . 

In addition to modules (which may be incomplete) and linking, only two other 
kinds of operations are needed for the m-calculus. One is selecting a component 
of a module, written M.f . The other needed operations are component hiding 
and sieving, written M\f and M\—T, necessary for certain kinds of namespace 
management. (There is also a “letrec” construct {M \ D) which we could have 
chosen to encode as {f t> x = M, D}.f.) 

1.4 Contributions of This Paper 

In section 2, we define the m-calculus, a calculus with modules and linking as 
primitive constructs. In the m-calculus modules are first-class. In section 3, we 
illustrate how various program construction mechanisms and module systems 
can be smoothly encoded in the m-calculus. In section 4, we give an overview of 
the proof of confluence, the bulk of which is treated in [44]. Confluence shows 
that equational reasoning via the m-calculus is sensible and well behaved and 
effectively means that rewriting is “meaning”-preserving. The m-calculus is the 
first calculus of linking for first-class primitive modules which has been proved 
confluent. (Modules are not first-class in [14, 35] and rewriting is not proven 
sound in [5].) In addition, in section 5, we discuss the related work. 
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As limitations, this paper does not deal with issues of types, strict evalua- 
tion, imperative effects, or classes and subclassing. As the A-calculus serves for 
functions, the m-calculus serves as a theoretical foundation for examining the 
essence of modularity and linking. Analyses of further issues can be built on the 
m-calculus as they have been built on the A-calculus. 

1.5 Acknowledgements 

We thank Zena Ariola and Lyn Turbak for inspirational discussions. 



2 The m-Calculus 

2.1 Syntax: Preterms and Raw Terms 

The preterms of the m-calculus (the members of the set PreTerm) are given by 
the following grammar for M\ 



x,y,z G Var 


(variables) 


f,g,h G GompName 


(component names) 


T C GompName 


(sets of component names) 


Fr.= f\- 


(component label) 


F ::= M 1 • 


(component body) 


c ::= (Fox = B) 


(component) 


D ::= Cl, . . . , c„ where n > 0 


(component collection) 


M,N::= x 


(variable) 


1 (M\f) 


(component hiding) 


1 (M\-F) 


(component sieving) 


1 (M©A) 


(linking) 


1 (M.f) 


(component selection) 


1 {D} 


(module) 


1 (m\d) 


(letrec) 



Let < when used on component names be some strict total order. The follo- 
wing operations on components and component collections are defined. Given 
a component c = {F\>x = B), we define Label(c) = F, Name(c) = Label(c) 
if Label(c) yf _ (otherwise undefined), Binder(c) = x, and Body(c) = B. Gi- 
ven a component collection D = ci,... ,c„, we define \D\ = n, D[i] = Ci if 
1 < i < n and is otherwise undefined, D[i := c] = Ci, . . . , Cj_i, c, Cj+i, . . . , c„ if 
I < i < n (otherwise undefined), Names(F) = {Label(ci), . . . ,Label(c„)} \ {_}, 
and Binders(F) = {Binder(ci), . . . , Binder(c„)}. Let D[I] = D[ii],... ,D[in] 
where ,i„} = 7 fl {1,... , |71|} and ii < ... < in. Let D[T] = 

D[ii \, . . . , D[in\ where {zi, . . . , z„} = { z | Name(7?[z]) G F} and Name(71[zi]) < 
... < Name(7?[z„]) (“components in D with names in F”). Let D\—T] = 
D[{i\ Label(71[z]) = F ^ T}] (“components in D with labels not in F”). 

The following terminology is defined. Let c = {F > x = B) he a, component 
occurring at the top-level (not nested) in a collection D (i.e., c = D[i] for some 
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i). If the label F = Label(c) is a name / (and D belongs to a module), then c can 
be referred to by its name from outside the module for the purposes of linking, 
selection, or hiding. In this case we may call / an external name to distinguish 
it from the hinder x which we may call an internal name. If F is the anonymous 
marker, written , then c is unnamed and is only accessible (internally) via its 
binder x. The variable x = Binder(c) is a binding occurrence of x which binds 
free occurrences of x in the bodies of all of the components of D to the body of 
c. If D is the environment of a letrec (M | D), then the binder for x also binds 
free occurrences of x in M . Non-binding variable occurrences are normal. The 
body B = Body(c) is either a preterm M or the empty body, written The 
component c can be of four possible kinds, one of which will be forbidden: 

— If c = (f > X = M) , then c is an exported or output component. 

— Ifc=(/>x = *) , then c is a deferred or input component. 

— If c = (_ > a: = M) , then c is private or a binding. 

— If c = (_>a: = •) , then this is an error (forbidden below). 

A module with input components is incomplete or abstract and otherwise is 
complete or concrete. 

The raw terms of the m-calculus (the members of RawTerm) are the preterms 
satisfying these conditions: (1) An unnamed component does not have an empty 
body. (2) Two named components in a collection do not have the same name. (3) 
Components in a collection bind distinct variables. (4) Components in a letrec 
environment are bindings (unnamed, non-empty bodies). 

We use the following conventions for syntactic abbreviations. When writing 
a member of Term (cf. Section 2.3), a component {F > x = B) may be writ- 
ten (F > _ = B) if no normal occurrences of x are bound by the component’s 
binder. A component {-t>x = B) may be written as (x = B); a component 
(/[>_= _B) may be written [f = B). The notation M\{fi, . . . , /„} stands for 
M\fi\f 2 ■ ■ ‘Xfn where fi < ■ ■ ■ < fn- The expression (let x = M in M') stands 
for (M' I X = M), provided x ^ FV(M). Parentheses may be omitted; the pos- 
sible ambiguities are resolved as by giving “\”, and ” higher precedence 
than “©” and making “©” left associative. 

The free variables of a raw term are defined thus: 

FV(*) = 0 FV(a;) = {x} 

FV(M\/) = Y\{M\-F) = FV(M./) = FV(M) 

FV(Mi © M 2 ) = FV(Mi) U FV(M2) 

FY{{D}) = FY{D) = (Ui<,<|z 5 | FV(Body(i^[i]))) \ Binders(T^) 
FV((M I D)) = (FV(M) \ Binders(i:')) U FV(i:>) 

The expression Capture^ (M) denotes the set of bound variables in raw term M 
whose binding scope includes a free occurrence of the specific variable x. The 
operation Mix := j/J renames to y all free occurrences of the variable x in M 
that are not in the scope of a binding of y. 

A distinguished variable □, which is forbidden from being bound, is used as 
the context hole. A context is a raw term with one occurrence of □. Let C be 
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a metavariable over contexts. The result of replacing the hole in C by the raw 
term M (without any variable renaming) is written C[M], 



2.2 Semantics: Structural and Computational Rewriting on Raw 
Terms 

A rule “X T if Z” is a schema which defines a contraction relation such 
that M N iS replacing the metavariables in X, Y, and Z by syntactic con- 
structs of the appropriate sort yields, respectively, the terms M and N and a 
true proposition. A rule schema of the form D D' abbreviates the pair of rule 
schemas {£>} {D'} and {M \ D) (M | D'). If a rewrite relation — >■ is the 

contextual closure of a contraction relation this means that — > is the least 
relation such that M N implies C[M] — >■ C[N] for any context C. 

The structural rewrite rules will use the following auxiliary definitions: 

UnsafeNames(x, I?) = Ui<i<|D| Capture 2 ,(Body(D[z])) UFV(D) U Binders(D) 

UnsafeNames(a;, {I?}) = UnsafeNames(a;, D) 

UnsafeNames(a;, (M | D)) = Capture 2 ,(M) U FV(M) U UnsafeNames(a;, D) 



BinderRenamed(z, x,y, D, D') 

/ D — (Fl C> Xi — B\ ) , . . . , (Fi [> X — , ■ ■ . , (Fn ^ Xn — , 

4=^ and D' = (Fi>xi = B[), ... ,(F,t>y = B(), . . . ,(F’„ ox„ = 
yand Bj = Bjlx := j/J for 1 < j < n 

The structural rewrite rules are as follows: 

(of-letrec) (M | D) (M[[x := y]] | D') 

fy ^ UnsafeNames(x, (M \ D)), 

[^Binder Renamed (z, x, y, D, D') 

(a- module) {Z?} {D'} 

•j. fy ^ UnsafeNames(x, {F>}), 

[Binder Renamed (z, x, y, D, D') 

(comp-order) Di, ci, £>2, C2, T>3 F?i, C2, £>2, Ci, £>3 



(link-commute) Mi © M2 M2 © M\ 



The computational rewrite rules, which are presented in Figure 1, use the 
following auxiliary definitions. The expression PickBody(£, B') yields B ii B' = 
• , B' ii B — •, and is otherwise undefined. DependsOn^j is the least transitive, 
reflexive relation on {1, . . . , |£)|} such that for all z, j G {1, . . . , |£)|}, 



DependsOn^(z, j) 



/ Binder(£)[j]) G FV(Body(£>[z])) 
yor (Body(£)[z]) = • and Label(I?[j]) yf _) 



The structural and computational contraction relations, and ~^c, are 
respectively the unions of the contraction relations of the structural and com- 
putational rules. The structural and computational rewrite relations, — +S and 
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(link) 



(subst) 



{D} © {D'} ^ {D[-T],D'[-T], D"} 

( ^ = {/i, • • • , fn} = Names(-D) n Names(D'), 
BmdeTs{D[-T]) n (Binders(D') U FV(D')) = 0, 
Binders(D'[-JP']) n (Binders(L») U FV(D)) = 0, 
if D[J^] = (/i >a:i = Bi), ... ,{f„>Xn = Bn), 
D'[^] = iflt>Xl = B[), ... ,{fn>Xn = B'n), 

D" = (/l>Xl =5)'), ••• = 

\B'i = PickBody(_Bi, B[) is defined for 1 < i < n 

D ^ D[i := {Fi t>Xi = C[Mj])] 

{D\f\ = {Fi>Xi = C[xj]), 

D \j] = {Fj > Xj = Mj), 

Capturen(C) D {{xj} U FV(Mj)) = 0, 

^ not DependsOn^ ( j, i) 



if i 



(subst-letrec) {C[x\ \ D) (C[M] | D) 
if [D[t\ = (_>a 
(Capturen(C 

(select) {D}.f {xi \ D') 



( D[i] = {_> X = M) for some i, 
Icapturen)^) C ({*} UFV(M)) = 0 



if 



D = {Fi> xi = Ml), ... ,(ft>Xi = Mi), . . . ,{F„ t> Xn 
D' = Xi = Ml), ... , l_t>Xi = Mi), ... , {_t>Xn = 



(gc-module) 



{D} - {D[I]} 

f I, J partition |D|}, 

J/0, 

Binders(D[J]) n FV(L»[/]) = 0, 
[Names(D[J]) = 0 



if 



(gc-letrec) 



{M \D)~^ {M \ D[I]) 

{ I, J partition {1, . . . , |D|}, 

J/0, 

Binders(D[J]) n (FV(M) U FV(D[7])) = 0 

(empty-letrec) {M \) -^ M 

(closure) ({^} I D') {D, D'} 

(\D'\>0, 

\Binders(L>) n (Binders(L>') U FV(D')) = 0 

(hide-present) {D\i ■.= (f t> x = M)]}\f ^ {F>\i := (_ > a; = M)]} 

(hide-absent) {D}\f {D} 

if / ^ Names(D) 

(sieve) {D}\— 7^ ^ {D'} 

' D = [Fl > = Bl), . . . ,{Fn >Xn= Bn), 

D' = (Fi > Xl = Bl), . . . ,{Fn >Xn = Bn) 

- li Fi ^ T and Bi • 

Fi liFiGF 



if 



F' = 



for 1 < i < n 



= M„), 

Mn) 



Fig. 1. The computational rewrite rules. 
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— +c, are the contextual closures of and ~^c, respectively. The structural 
equivalence relation, =s, is the transitive, reflexive, and symmetric closure of 
— +S. The (combined) contraction relation on raw terms is U and 

the (combined) rewrite relation on raw terms is --■» = — ^sU — +C- The relations 
— *s) — *c) and — * are the transitive, reflexive closures respectively of — +S) 
— +c, and — 

While variables are subject to a-conversion, component names are not. This 
is similar to the way that a linker freely relocates (rename) offsets (internal 
names) within object flies as necessary but does not generally rename symbol 
table entries (external names). 

In the presence of cyclic bindings, the usual meta-level substitution and expli- 
cit substitution both result in size explosions and generally fail to provide the de- 
sired equations between programs. To avoid these difficulties, unlike the calculus 
of Ancona and Zucca [5] , the m-calculus substitutes for one target at a time (via 
the (subst) and (subst-letrec) rules) in a style pioneered by Ariola, Blom, and 
Klop [8, 6, 7]. The m-calculus letrec contruct is, in a sense, a delayed substitution 
that allows avoiding duplication when a component is selected from a module. 

The (subst) rule in Figure 1 uses the notion of one component of a collec- 
tion depending on another to exclude certain rewriting possibilities. Without 
this condition of the (subst) rule, the m-calculus would not be confluent and 
would need a more complicated method as in [35] to prove soundness. Read 
DependsOn£)(j, t) as “component D[j] depends on component D[i] in collection 

. The first condition of DependsOn^, handles syntactically evident dependen- 
cies. The second condition handles the possibility that a dependency will arise 
after linking the module {!?} with another module. Every input component is 
presumed to (potentially) depend on every output component, because there is 
always a module to link with that will cause the dependency to become real. 

Most of the side conditions of the computational rules which concern the 
names of bound variables can be met by applying the structural rules first. This 
is the case for the use of Binders by (link) and (closure), the use of Capture by 
(subst) and (subst-letrec), and the way that (link) ensures that the binders 
of common components have the same name before linking. The side condition 
in (closure) that the component collection is non-empty merely avoids a trivial 
critical pair with (empty- letrec), making proofs easier. 

The possible dynamic errors that can occur during computation in the m- 
calculus are (1) linking two modules whose output components are not disjoint, 

(2) selecting a component from an incomplete module, (3) selecting a component 
named / from a module which has no component named /, (4) hiding an input 
component, and (5) sieving out an input component. The following are examples 
of each of the kinds of errors: 

(1) {f o w = •, g >x = M} © {f o y = N, g > z = N'} 

(2) {f o w = •, g >x = M}.g 

(3) {fow = M, g>x = N}.h 

(4) {fow = *, g>x = N}\f 

(5) {f>w = ., g>x = N}\-{g} 
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2.3 The Calculus: Terms and Rewriting 

The actual m-calculus is defined as At = (Term, — >■) = (RawTerm, — ■*c)/ =s- 
By this we mean that: 

— The set Term of (real) terms is the set of equivalence classes of the raw terms 
under =s (the structural equivalence relation). 

— A term [M]^^ (the equivalence class of raw term M under =g) rewrites to 

a term written — >■ iff there are raw terms M' G [M]=^ 

and N' G such that M' — +c N' . 

We assume throughout that raw terms are implicitly coerced to (real) terms when 
placed in a context requiring a term, e.g., M — >■ N means [M]^^ — >■ 

Let — » be the transitive, reflexive closure of — >■. 



3 Encoding Featnres in the m-Calculus 

This section illustrates smooth encodings of various program construction me- 
chanisms in the m-calculus. 



3.1 Functions (A-Calculns) 

We define A-calculus as syntactic sugar for m-calculus terms as follows, where 
“arg” and “res” are fixed component names (meaning “argument” and “result”): 

(Xx.M) = {arg t>x = •, res = M} 

{MM') = {M © (arg = M'}).res 



This encoding is faithful to the meaning of the A-calculus. We can verify the 
simulation of /3-reduction as follows (where M[x := M'] is defined appropriately): 



(link) 

(select) 

(subst-letrec) 

(gc-letrec) 

(subst-letrec) 

(gc-letrec) 

(empty- letrec) 



{\x.M)M' 

= ({arg 0 x = •, res = M} © (arg = M'}).res 
= ({arg > X = •, res > y = M} © {arg > x = M'}).res 
where y i FV(M) U FV(M') and x ^ FV(M') 
— ^ {arg > X = M', res i> y = Mj.res 
— \ {y\x = M' ,y = M) 

— > {M\x = M' ,y = M) 

— >{M\x = M') 

(M[x := M']\x = M') 

{M[x := M'\ I ) 

— ^ M[x := M'] 



This encoding is similar to an independently developed encoding in [5] . It is only 
superficially related to the encoding of A-calculus in c-calculus [3] . 
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3.2 Records and Record Operations 

By the syntactic abbreviations defined in Section 2, record syntax is already 
accepted by the m-calculus. Furthermore, the expected rewrite rule for selection 
is simulated. 



{/i = Ml, = Mn}.fi — » Mi if 1 < t < n 

The simulation uses (select), (gc-letrec) (which can be applied because the 
internal names are not used), and (empty-letrec). 

3.3 Objects (^-Calculus) 

The following record-of-methods encoding for the c-calculus [3] works fine. We 
write “!” for the method invocation operator to avoid confusion with our com- 
ponent selection operator 

[/i = c(x)Mi, ... , /„ = <r(x)M„] = {/i = Ax. Ml, ...,/„ = Ax.M„} 

(M ^ / = ?(x)M') = M\f © [/ = <r(x)M'] 

Mlf = (let X = M in (x./)x) where x is fresh 

It is not hard to verify that the rewrite rules of the c-calculus are simulated: 

M!/, ^ M[x := M] 

where M = [/i = c(x)Mi, ... , /„ = ?(x)M„] and 1 <i < n 

[fi = ... , /„ = <r(x)M„] ^ /i = c(x)M' 

— » [/i = c(x)Mi, ... ,fi = c(x)M', ... , /„ = c(x)M„] where l<i<n 

Of course, the real difficulty in dealing with objects is not in expressing their 
computational meaning but rather in devising the type system, an issue which 
we do not address in this paper. 

3.4 Modules 

C-style The m-calculus directly supports the modules of C-like languages. (The 
call-by-value evaluation and imperative features of C are left to future work.) 
Each object file O can be represented as a module M, and the linking of the 
modules Mi, . . . , M„ to form a program is represented as P = (Mi © ... © M„). 
Invoking the program start routine is represented as (P.main). 

Package-style For the package style of module system, a module named A 
which imports modules named Pi, . . . , P„ and exports entities named /i, . . . , 
fm is represented by an m-calculus module with one output component named 
A, and n input components named Pi, . . . , P„. The output component is in 
turn a module with n output components named /i , . . . , fm and some number 
of private components. The linking of modules Mi, . . . , Mn to form a program 
is again represented as P = (Mi©...©M„). Invoking the start routine of 
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the program is now represented as (P.Main.main), i.e., there is a distinguished 
module named “Main” which must export a component named “main” . 

Consider for example the following Haskell program, where P(A.f) is an 
expression mentioning A.f and similarly for Q(B.f ,B.g) and N: 

module A (f) where module Main (main) where 

f = N import qualified B 

f = 5 

module B (f, g) where main = Q(B.f,B.g,f) 

import A 
g = P(A.f) 

This program can be encoded in the m-calculus with these three modules, where 
A, B, main, and Main are component names: 

Ma ={A={f=N}} 

Mb ={A>x=., B = {g = P(x.f), f=x.f}} 

Miviain = {B > X = •, Main = {y = 5, main = Q(x.f , x.g, y)}} 

Note that the unexported “f” definition in Main is handled by a private com- 
ponent, so a variable “y” must be used instead of a component name. We can 
check the meaning of the program by rewriting: 

(Ma © Mb © MMain) 

^ fA[>x = {f=N}, B>z = {g = P(x.f), f=x.f},1 
(Main = {y = 5, main = Q(z.f, z.g, y)} j 

^ {A = {f = N}, B = {g = P(N), f = N}, Main = {main = Q(N, P(N), 5)}} 

Thus, the overall meaning of the program is given by: 

(Ma ©Mb © MMain) •Main.main — » Q(N,P(N),5) 

In the Haskell example above, we used qualified names of the form A.f. In 
module B we could have used the unqualified name f to refer to the entity A.f. 
When a module imports more than one other module, a Haskell implementation 
uses its knowledge of the imported modules to determine the correct meaning 
of unqualified names. To encode Haskell modules into the m-calculus, we could 
use a translation that fully qualifies all names in each using information about 
the entire program. 

However, it is desirable to reason about unqualified names in order to reason 
about modules separately. Consider for example the above Haskell program with 
module B replaced by the following modules: 

module B (f, g, i) where module C (h) where 

import A h = R 

import C 
i = 10 

g = P(f,h,i) 




424 J.B. Wells and R. Vestergaard 



The name f in module B will end up referring to A.f, because there is no C.f, 
but this can not be determined without inspecting modules A and C. The name 
i in module B will only be legal if A . i and C . i do not exist. We can encode these 
modules as the following (extended) m-calculus modules: 

M' _ jAoy = •, C>z = •, B>w = {i = 10, g = P(x.f,x.h,x.i), f = x.f}, 

® (x = (y \- {f , h, i}) © (z \- {f , h, i}) © (w \- {f , h, i}) 

Me = {C = {h = R}} 

The key idea of this encoding is adding the extra private component defining x 
to automatically resolve the unqualified names by picking them from whichever 
module is supplying them. Then we can verify that: 

(Ma © M'b © Me © MMain)-Main.main — » Q(N, P(N, R, 10), 5) 

In the above example, observe that if Mg is linked with two modules M(^ and 
Mq whose A and C components both supply f, then the linking operation in Mg 
which yields the private definition of x will get stuck. This corresponds to the 
fact that this is (usually) illegal in Haskell. (It is legal in Haskell for modules B 
and C to import module A and export A.f, and for module D to import both B 
and C and refer to the unqualified name f, because both B.f and C.f are aliases 
for A . f . It seems that the m-calculus would need to be extended to reason about 
sharing in order to encode this behavior.) 

The Haskell module system has other features such as the ability to list which 
entities to import from a module, the ability to list entities not to import with 
unqualified names, local aliases for imported modules, and the ability to reexport 
all of the entities imported from another module. All of these features can be 
represented in the m-calculus. 



ML-style The m-calculus can also represent the type-free aspects of ML-style 
modules. (The types, call-by- value evaluation, and imperative features of ML are 
left to future work.) Such module systems provide modules called structures as 
well as a A-calculus {functors and functor applications) for manipulating them. 
A structure is essentially a dependent record; it is dependent in the sense that 
later fields can refer to the values of earlier fields. A functor is essentially a A- 
abstraction whose body denotes a structure; a functor definition is the top-level 
binding of a functor to its name. ML structures can be encoded in the m-calculus 
as concrete modules. ML functors and functor applications can be encoded in 
the m-calculus via the A-calculus encoding given in Section 3.1. 



4 The Well-Behavedness of the Rewrite Rules 

This section sketches the proof that the m-calculus is not only confluent but 
that it also satisfies the finite developments property. Due to space limitations, 
the details are only in the long version [44]. 
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Proving these results uses a variation of the m-calculus which adds redex 
marks for tracking residuals of redexes of the computational rules and pre- 
venting contraction of freshly created redexes. Redexes of the (link), (select), 
(empty-letrec), (closure), (hide- present), (hide-absent), and (sieve) rules 
are marked at the root in the usual way. Redexes of (subst) and (subst-letrec) 
are marked at the variable which is the substitution target rather than at the 
root. Redexes of (gc-module) and (gc-letrec) are also not marked at the root; 
instead each component that can be garbage collected is marked. All marks are 
0 except for substitution marks which must be 1 greater than all of the marks in 
the substitution source component body. (Due to the side condition on (subst) 
using DependsOn, it is always possible to mark all redexes in a term.) 

Strong normalization (termination of rewriting) of the marked m-calculus is 
proved using a decreasing measure, the multiset of all marks in the term, in 
the well founded multiset ordering. Weak confluence of the marked m-calculus 
is proved by several lemmas established by careful case analyses together with a 
top-level proof structure that separately considers structural and computational 
rewrite steps. Our proof deals with and accounts for every structural operation 
(i.e., a-conversion and re-ordering) explicitly. 

The combination of strong normalization and weak confluence of the marked 
m-calculus yields confluence of the marked m-calculus. Then developments are 
defined as those rewrite sequences of the m-calculus that can be lifted to the 
marked m-calculus. Using the confluence of the marked m-calculus, we prove 
that the results of any two coinitial developments can be joined by two further 
developments. Standard techniques then finish the proof of confluence of the m- 
calculus. Confluence is shown both for — > (on terms) and — ■» (on raw terms). 



5 Related Work 

5.1 Calculi with Linking 

Cardelli presents a simply-typed linking calculus for outermost-only modules 
without recursion [14] . Drossopoulou, Eisenbach, and Wragg give a module calcu- 
lus for reasoning about the quirks of Java [16]. Ancona and Zucca give a calculus 
for linking modules which, although similar to ours, has a notion of substitution 
which we believe is less convenient and no published proof of rewriting pro- 
perties [5]. Earlier, Ancona and Zucca also presented an algebra for simplifying 
module expressions which is not powerful enough to represent general compu- 
tation [4]. Machkasova and Turbak give a calculus for linking outermost-only 
modules in a call-by- value language [35]. 

From a non-equational-reasoning point of view, Flatt and Felleisen give a 
calculus of modules with similar capabilities to ours [21]. Clew and Morrisett 
present a module calculus tailored towards dealing with linking of object files 
containing assembly-language-level code [24]. Waddell and Dybvig show how to 
encode modules and linking using Scheme’s macro system [42]. 
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5.2 Mixins 

Duggan and Sourelis present a system of “mixin modules” which has the unique 
feature that when both modules have components with the same name, linking 
the modules results in a form of merging of the same-named components [17, 
18]. Bracha and Lindstrom encode mixins using A-calculus, records, and fix- 
point operators [13, 12]. Findler and Flatt describe using mixins and incomplete 
modules in actual programming [19]. Flatt and Krishnamurthi and Felleisen 
present a calculus with an operational semantics for mixins and classes in the 
context of Java [22]. 

5.3 Calculi for Cycles 

Inspiring much of our formulation, Ariola and Klop did ground-breaking work 
on reasoning about A-terms combined with a construct for mutually recursive 
definitions [8]. Ariola and Blom refined this work to prove consistency in the 
absence of confluence [6, 7]. 

5.4 ML-Style Modules vs. Types 

Crary, Harper, and Puri describe how to extend the ML module system to deal 
with recursion [15]. Earlier work to add first-class modules (i.e., higher-order 
functors) to ML includes that of Russo [41], Harper and Lillibridge [27, 34], and 
Leroy [33]. Harper, Mitchell, and Moggi devised the phase distinction to show the 
decidability of type checking for the ML module system [28]. Jones shows how 
to avoid much of the complexity of typing ML-style modules via higher-order 
(parametric) signatures [31, 30]. 



5.5 Types vs. Concatenation and Extension for Records and 
Objects 

When we extend our system with types, we will closely consider previous work on 
types for record concatenation [43, 29], extensible records [39, 23], and extensible 
objects [20, 40, 11]. 
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